C SKY V2 CPU Applications Binary Interface Standards Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 78

Download
Open PDF In Browser	View PDF

C-SKY V2 CPU Applications Binary
Interface Standards Manual
Release 2.1

csky

Nov 15, 2018

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.
This document is the property of Hangzhou C-SKY MicroSystems Co.,Ltd. This document may only be
distributed to: (i) a C-SKY party having a legitimate business need for the information contained herein,
or (ii) a non-C-SKY party having a legitimate business need for the information contained herein. No
license, expressed or implied, under any patent, copyright or trade secret right is granted or implied by
the conveyance of this document. No part of this document may be reproduced, transmitted, transcribed,
stored in a retrieval system, translated into any language or computer language, in any form or by any
means, electronic, mechanical, magnetic, optical, chemical, manual, or otherwise without the prior written
permission of C-SKY MicroSystems Co.,Ltd.
Trademarks and Permissions
The C-SKY Logo and all other trademarks indicated as such herein are trademarks of Hangzhou C-SKY
MicroSystems Co.,Ltd. All other products or service names are the property of their respective owners.
Notice
The purchased products, services and features are stipulated by the contract made between C-SKY and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided ”AS IS” without warranties, guarantees
or representations of any kind, either express or implied. The information in this document is subject to
change without notice. Every effort has been made in the preparation of this document to ensure accuracy
of the contents, but all statements, information, and recommendations in this document do not constitute a
warranty of any kind, express or implied.

Hangzhou C-SKY MicroSystems Co.,LTD
Address: 15 Story of Building A, Tiantang software center,XiDouMen road, Xihu district, Hangzhou, China
Post code: 310012
Offical website: www.c-sky.com

i

Contents

1 About this Document
1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 References . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Current status and anticipated changes . . . . . . . . . .
1.5 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Low-Level Run-Time Binary Interface Standards
1.5.2 Object File Binary Interface Standards . . . . . .
1.5.3 Source-Level Standards . . . . . . . . . . . . . . .
1.5.4 Library Standards . . . . . . . . . . . . . . . . .
1.5.5 Change history . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

1
1
1
2
2
3
3
3
3
3
3

2 Lower-level Binary interfaces
2.1 Processor Architecture . . . . . . . . . . .
2.1.1 Control Registers in C-SKY V2.0 .
2.1.2 Primary Data Type . . . . . . . . .
2.1.3 Composite Data Type . . . . . . .
2.2 Function Calling Convention . . . . . . . .
2.2.1 Register Assignments . . . . . . . .
2.2.2 Stack Frame Layout . . . . . . . .
2.2.3 Argument Passing . . . . . . . . .
2.2.4 Variable Arguments . . . . . . . . .
2.2.5 Return Values . . . . . . . . . . . .
2.3 Runtime Debugging Support . . . . . . . .
2.3.1 Function Prologues in C-SKY V2.0
2.3.2 Stack Tracing . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

4
4
5
6
8
9
9
11
12
13
14
15
15
16

3 High language Issures
3.1 C preprocessor predefinitions
3.2 Inline assembly syntax . . .
3.2.1 Overview . . . . . .
3.2.2 Basic usage . . . . .
3.2.3 Extended asm . . . .
3.2.4 Examples . . . . . .
3.3 Name mapping . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

17
17
17
17
18
18
22
24

4 ELF file format

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

25
ii

4.1
4.2

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

25
27
27
28
28
29
29
29
32
43
45
46
46
47
47
50
50
51
51
52
53
54

5 Runtime library
5.1 Compiler assisted Libraries . . . . . . .
5.2 Floating Point Routines . . . . . . . . .
5.2.1 Arithmetic functions . . . . . .
5.2.2 Conversion functions . . . . . .
5.2.3 Comparison functions . . . . .
5.3 Long Long integer Routines . . . . . . .
5.3.1 Arithmetic functions . . . . . .
5.3.2 Comparison functions . . . . .
5.3.3 Trapping Arithmetic Functions
5.3.4 Bit Operations . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

56
56
57
57
58
59
59
60
61
61
61

6 Assembly syntax and directives
6.1 Section . . . . . . . . . . . . . . . . .
6.2 Input line lengths . . . . . . . . . . .
6.3 Syntax . . . . . . . . . . . . . . . . .
6.3.1 Preprocessing . . . . . . . . .
6.3.2 Symbols . . . . . . . . . . . .
6.3.3 Constants . . . . . . . . . . .
6.3.4 Expressions . . . . . . . . . .
6.3.5 Oprators and Precedence . . .
6.3.6 Instruction Memonics . . . . .
6.3.7 Instruction Arguments . . . .
6.4 Assembler directives . . . . . . . . . .
6.4.1 .align abs-exp [, abs-exp] . . .
6.4.2 .ascii “string” {, “string”} . .
6.4.3 .asciz “string” {, “string”} . .
6.4.4 .byte exp {, exp} . . . . . . .
6.4.5 .comm symbol, length [, align]
6.4.6 .data . . . . . . . . . . . . . .
6.4.7 .double float {, float} . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

62
62
62
63
63
63
64
64
64
65
65
66
66
67
67
67
67
67
67

4.3
4.4

4.5
4.6

4.7

4.8

ELF Header . . . . . . . . . . . . .
Section Layout . . . . . . . . . . . .
4.2.1 Section Alignment . . . . .
4.2.2 Section Attributs . . . . . .
4.2.3 Special Sections . . . . . . .
Symbol Table Format . . . . . . . .
Relocation Information Format . . .
4.4.1 Reclocation Fields . . . . .
4.4.2 Relocation Types . . . . . .
Program Loading . . . . . . . . . .
Dynamic Linking . . . . . . . . . . .
4.6.1 Dynamic Section . . . . . .
4.6.2 Global Offset Table . . . . .
4.6.3 Function Address . . . . . .
4.6.4 Procedure Linkage Table . .
PIC Examples . . . . . . . . . . . .
4.7.1 Function proglogue for PIC
4.7.2 Date Objects . . . . . . . .
4.7.3 Function Call . . . . . . . .
4.7.4 Branching . . . . . . . . . .
Debugging Information Format . . .
4.8.1 DWARF Register Numbers

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

iii

6.5

6.4.8 .equ symbol, expression . . . . . . .
6.4.9 .export symbol {, symbol} . . . . . .
6.4.10 .fill count [, size [, value]] . . . . . . .
6.4.11 .float float {, float} . . . . . . . . . .
6.4.12 .ident “string” . . . . . . . . . . . . .
6.4.13 .import symbol {, symbol} . . . . . .
6.4.14 .literals . . . . . . . . . . . . . . . . .
6.4.15 .lcomm symbol, length [, alignment] .
6.4.16 .long exp {, exp} . . . . . . . . . . .
6.4.17 .section name [, “attributes”] . . . .
6.4.18 .short exp {, exp} . . . . . . . . . . .
6.4.19 .text . . . . . . . . . . . . . . . . . .
6.4.20 .weak symbol [, symbol] . . . . . . .
Pseudo-Instructions . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

68
68
68
68
68
68
68
68
69
69
69
69
70
70

iv

CHAPTER

1

About this Document

This chapter would be organized with several sections as follows.
• Abstract
• Purpose
• References
• Current status and anticipated changes
• Overview

1.1 Abstract
This manual defines the C-SKY V2 CPU Applications Binary Interface (ABI). The ABI consists of a serial
of interfaces which the writer of compiler and assembler might follows, as composing tools for the C-SKY
V2 CPU architecture. These standard covers several aspects of whole tool chain, varing from run-time to
object formats, so as to make sure that differnet tool chain implementations of the C-SKY CPU shoule be
compatible and interoperated.
Although compiler supportive routines are provided, this manual does not describe how to write C-SKY V2
CPU development tools, does not define the services provided by an operating system, and does not define
a set of libraries. Those tasks must be performed by suppliers of tools, libraries, and operating systems.

1.2 Purpose
The standards only defined in this manual ensure that all components of development tool for C-SKY V2
CPU (do not include C-SKY V1 CPU) should be fully compatible with each other. Fully compatible tools
could be interoperated, thus, making it is possible to select an optimal tool for each part in the chain instead
of selecting an entire chain on the basis of overall performance. The Technology Center of Hangzhou C-SKY
Microsystems Co., Ltd also provide a test suite to verify compliance with published standards.

1

Chapter 1. About this Document

It is sufficial for developer to follow by this standard. Concretely, the standards ensure that compatible
libraries of binary components can be created and maintained. Such libraries make it is possible for developers
to synthesize applications from binary components, and can make libraries of common services stored in onchip ROM available to applications executing from off- chip ROM. With established standards, developer
can build up libraries over time with the assurance of continued compatibility.
There are two goals required for implemented to conform to the standard.
• Use of interfaces that allow future optimizations for performance and energy.
For example, when possible, registers are used to pass arguments, even though always using the stack
might be easier. Small programs whose working sets fit into the registers are thus not forced to make
unnecessary memory references to the stack just to satisfy the linkage convention.
• Use of interfaces that are compatible with legacy “C” code written for the C-SKY when possible.
For example, whenever possible, C-SKY V2 CPU rules are used to build an argument list. This not
only fits the C-SKY V2 CPU programmer’s expectations, but easily supports

1.3 References

GC++ABI
GDWARF
GABI
GLSB
Open BSD
C-SKY CPU ABI V1.0

Table 1.1: The references
http://www.codesourcery.com/
cxx-abi/abi.html
http://dwarf.freestandards.org/
Dwarf3Std.php
http://www.sco.com/developers/
gabi/
http://www.linuxbase.org/spec/
refspecs/
http://www.openbsd.org/
C-SKY CPU ABI Standards.pdf

Generic C++ ABI
DWARF 3.0, the generic debug
Generic ELF, 17 th December 2003
draft.
gLSB v1.2 Linux Standard Base
Open BSD standard

1.4 Current status and anticipated changes
1. This manual has been released publicly. This manual is meant to be expandable.
2. Anticipated changes to this document include typographical corrections and clarifications.
3. Additional features about C++ ABI would be appended into this document to replect improvment in
the future.
4. Supporting of PE object file format is anticipated to be added to this manual.
5. The Linux system interface for compiled application programs(The ABI for C-SKY V2.0 Linux)is
anticipated to be added to this manual
6. TLS for Linux ABI, Thread Local Storage (TLS) is a class of own data (static storage), like stack,
would be added.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

2

Chapter 1. About this Document

1.5 Overview
Standards in this manual are intended to preclude creation of incompatible development tools for the C-SKY
V2.0, by ensuring binary compatibility between:
• Object modules generated by different tool chains
• Object modules and the C-SKY V2.0 processor
• Object modules and source level debugging tools
Current definitions include the following types of standards.

1.5.1 Low-Level Run-Time Binary Interface Standards
• Processor specific binary interface, such as the instruction set, representation of primitive data types,
and exception handling
• Function calling convention that the method of passing arguments and returning result on calling to
another function arguments are passed and results are returned. This manual will specify how the
arguemnt should be passed by register or stack slot according to its type.

1.5.2 Object File Binary Interface Standards
• Header convention
• Section layout
• Symbol table format
• Relocation information format
• Debugging information format

1.5.3 Source-Level Standards
• C language, e.g. preprocessor predefines, in-line assembly, and name mapping.
• Assembly, e.g. the syntax and directives.

1.5.4 Library Standards
• Compiler assist libraries, including some library functions supporting operation on floating point and
long long integer, for instance, addition of two integer of type long long, etc.

1.5.5 Change history

Revision
V2.0
V2.1

Release 2.1

Date
2011-12-14
2018-04-13

Table 1.2: Record of Change
Changed by
Description
LiChunQiang
First public release used only for C-SKY V2 CPU
JianpingZeng
Second public release used only for C-SKY V2 CPU

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

3

CHAPTER

2

Lower-level Binary interfaces

In order to served as a well documented index, this chapter would be splitted into following several different
sections.
• Processor Architecture
• Function Calling Convention
• Runtime Debugging Support

2.1 Processor Architecture
C-SKY processor is a 32-bit high-performance and low-power embedded processor designed for embedded
system or SoC environment. It adopts independently design of architecture and micro-achitecture with
extensible instruction set, which owns great features, e.g. configurable hardware, re-synthesis, easily integration etc. Additionly, it is excellent in power management. It adopts several strategies to reduce power
consumption including statically designed and dynamic power supply management, low voltage supply, entering low power mode and closing internal function modules. Now, C-SKY CPU instruction system has
two versions:
• C-SKY V1
Any CPUs confirmed C-SKY V1.0 Instructions are always 16-bit and are aligned on a 2-byte boundary.
There are two sub-serials, CK500 & CK600. The serial of CK500 include CK510, Ck520, CK510(ES),
and CK600 include CK610, ck620 and ck610(ESM-F). CK510 is the first generation of C-SKY IP. Also
CK610 is the second generation of C-SKY IP which is more efficient than CK510. CK520/CK620 adds
OMFLIP, MAC, MTLO, MTHI, MFHI and MFLO instructions based on CK510/CK610 instruction
set.
’E’ means DSP enhancement, ‘S’ means SPM, ‘M’ means MMU, and ‘-F’ means supporting of Float
Point. Pelease consult the CK500 & CK600 Reference Manual to view description for detailed information.
• C-SKY V2

4

Chapter 2. Lower-level Binary interfaces

The 2nd generation of instruction set of CK-CPU, which has more power and extensible instructions
set than CK500 & CK600, even though second one is compatible with CK500 & CK600 in the level of
assemble language. C-SKY V2.0 instruction set is the freely mixture of 32-bit and 16-bit instruction,
and it’s alignment boundary is two bytes.
What’s important is:
– Most of 16-bit instructions have been limited to only access 8 of partial general- purpose registers,
r0-r7, known as the low registers. A few number of 16-bit instructions have the legal accessibility
to the high registers, r8-r15.
– In the most of cases, operations should be accomplished by at least two 16-bit instructions so as
to gain more efficiency.
You must note that the C-SKY V2.0 instruction sets are not freely exchangebale with V1.0. Conversely,
available function provided by V2.0 is identical to V1.0 for most of applicatios. So that we strongly recommend that you should make sure you are aware of the generated result of specified application when you use
them stimuleously. The two instruction sets differ in how instructions are encoded:
The standards defined in this manual ensure that all parts of development tools for C-SKY V2 CPU (do not
include C-SKY V1 CPU) would be fully compatible.

2.1.1 Control Registers in C-SKY V2.0
The C-SKY ABI V2 defines an array of rules illustrating the developer should how to use the 32
general-purpose 32-bit registers of the C-SKY V2.0 processor. These registers are named r0~r31 or
a0~a6/t0~t10/l0~l10/gb/sp/lr. C-SKY V2.0 Co-processor 0 has up to 32 control registers. These registers are named cr0 through cr31. The control registers are shown in Table 2.1. These control registers can
access with mtcr/mfcr instructions.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

5

Chapter 2. Lower-level Binary interfaces

Table 2.1: C-SKY V2 Controls Register
Register Use Convention
Reg
Name
Function
cr0
psr, cr0
Processor Status Register
cr1
vbr,cr1
Vector Base Register
cr2
epsr,cr2
Shadow Exception PSR
cr3
fpsr,cr3
Shadow Fast Interrpt PSR
cr4
epc,cr4
Shadow Exception Program Counter
cr5
fpc,cr5
Shadow Fast Interrupt PC
cr6
ss0,cr6
Supervisor Scratch Register
cr7
ss1,cr7
Supervisor Sratch Register
cr8
ss2,cr8
Supervisor Scratch Regsiter
cr9
ss3,cr9
Supervisor Scratch Register
cr10
ss4,cr10
Supervisor Scratch Register
cr11
gcr,cr11
Global Control Register
cr12
gsr,cr2
Global Status Register
cr13
cpidr
Product ID Register
cr14
cr14
Rerserved
cr15
cr15
Rerserved
cr16
cr16
Rerserved
cr17
cfr
Cache Flush Register
cr18
ccr
Cache Config Register
cr19
capr
Cachable and Access Popedom Register(MGU processor only)
cr20
pacr
Protected Area Config Register(MGU processor only)
cr21
prsr
Protected Area Select Register(MGU processor only)
cr22-cr31 cr22-cr31 Reserved
The ABI does not mandate the semantics of the C-SKY Hardware Accelerator Interface (HAI) because these
semantics vary between C-SKY implementations based on particular chips. C-SKY V2 provides instruction
encodings to move, load, and store values for up to other 15 co-processors (except for co-processor 0).

2.1.2 Primary Data Type
The C-SKY processor works with the following raw data types:
1. unsigned byte of eight bits
2. unsigned halfword of 16 bits
3. unsigned word of 32 bits
4. signed byte of eight bits
5. signed halfword of 16 bits
6. signed word of 32 bits
As the listed above, the data size could be 8-bit bytes, 16-bit halfwords and 32-bit words. The mapping
between these data types and the C language fundamental data type is shown in Table 2.2.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

6

Chapter 2. Lower-level Binary interfaces

Table 2.2: Mapping of C Fundamental Data
Fundamental Data Types
ANSI C
Size(byte) Align
char
1
1
unsigned char
1
1
signed char
1
1
short
2
2
unsigned short
2
2
signed short
2
2
long
4
4
unsigned long
4
4
signed long
4
4
int
4
4
unsigned int
4
4
signed int
4
4
enum
4
4
pointer
4
4
long long
8
8
unsigned long long 8
8
float
4
4
double
8
8
long double
8
8

Types to the C-SKY
C-SKY
unsigned byte
unsigned byte
signed byte
signed halfword
unsigned halfword
signed halfword
signed word
unsigned word
signed word
signed word
unsigned word
signed word
signed word
unsigned word
signed word[2]
unsigned word[2]
unsigned word
unsigned word[2]
unsigned word[2]

Memory access to unsigned byte-sized data is directly supported through both ld.b (load byte) and st.b
(store byte) instruction. Signed byte-sized access requires a sextb (sign extension) instruction after the
ld.b. alternatively, memory access to signed byte-sized data can be directly supported through the ld.bs
(load byte) and st.bs (store byte) instructions. Access to unsigned halfword-sized data is directly supported
through the ld.h (load halfword) and st.h (store halfword) instructions. Signed halfword access requires a
sexth (sign extension) instruction after the ld.h. In the other hand, memory access to signed halvword-sized
data can be directly supported through the ld.hs (load byte) and st.hs (store byte) instructions. Memory
access to word-sized data is supported through ld.w (load word) and st.w (store word) instruction. Also,
ld.w suffices for both signed and unsigned word access because the operation sets all 32 bits of the loaded
register.

Figure 2.1: Data layout in memory

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

7

Chapter 2. Lower-level Binary interfaces

Table 2.3: Data Layout in register
SSSSSSSSSSSSSS S Byte
00000000000000
Byte
SSSSSSS | S halfword
0000000 | Halfword
Byte0 | Byte1
Byte2 | Byte3
C-SKY V2 CPU supports standard two’s complement data formats. The operand size for each instruction is
either explicitly encoded in the instruction (load/store instructions) or implicitly defined by the instruction
operation (index operations, byte extraction). Typically, instructions operate on all 32 bits of the source
operand(s) and generate a 32-bit result.
C-SKY V2 CPU memory might be working in big endian or little endian byte ordering depending
processor configuration (see Figure 2-1 Data Organization in Memory). When configuraed with big
mode (by default), the most significant byte (byte 0) of word 0 is located at address 0. For little
mode, the most significant bye of word 0 is located at address 3. Any data of primitive type is
naturally aligned in memory, i.e., a long is 4-byte aligned, a short is 2-byte aligned.

on the
endian
endian
always

Within registers, bits are numbered within a word starting with bit 31 as the most significant bit (see Figure
2-2 Data Organization in Registers). By convention, byte 0 of a register is the most significant byte regardless
of Endian mode. This is only an issue when executing the xtrb[0-3] instructions.
The C-SKY processor currently does not support the long long int data type with 64-bit operations. However,
compliant compilers must emulate the data type. The long long int data type, both signed and unsigned, is
eight bytes in length and 4-byte aligned.
Requiring long long int support as part of the ABI insures that the feature will exist in all tool chains, so
that application developers can depend on its existence. Because C-SKY processor can only hold a 32 bits
data in a register, long long or double must be held in two registers(like r1,r2), and the most significant
word of long long or double always is held in the upper register(like r2), the other word is held in the lower
register(like r1) for big endian or little endian. when storing in memory, the most significant word of long
long or double always is held in the upper address, the other word is held in the lower address for big
endian or little endian. The C-SKY processor currently support floating point data with coprocessor FPU.
Compliant compilers must support its use. The floating point format to be used is the IEEE standard for
float and double data types. Supportting for the long double data type is optional but must conform to the
IEEE standard format when provided. Alignments are specifically chosen to avoid the possibility of access
faults in the middle of an instruction (with the exception of load/store multiple).

2.1.3 Composite Data Type
There is no two same leaf in the world, compound data types, such as array, structure, union, and bit
fields, have different alignment characteristics. Arrays have the same alignment as their individual elements.
Unions and structures have the most restrictive alignment of their members. A structure containing a char,
a short, and an int must have 4-byte alignment to match the alignment of the int field. In addition, the size
of a union or structure must be an integral multiple of its alignment. Padding must be applied to the end of
a union or structure to make its size a multiple of the alignment. Members must be aligned within a union
or structure according to their type; padding must be introduced between members as necessary to meet this
alignment requirement. Bit fields cannot exceed 32 bits nor can they cross a word (32 bit) boundary. Bit
fields of signed short and unsigned short type are further restricted to 16 bits in size and cannot cross 16-bit
boundaries. Bit fields of signed char and unsigned char types are further restricted to eight bits in size and
cannot cross 8-bit boundaries. Zero-width bit fields pad to the next 8, 16, or 32 bit boundary for char, short,
and int types respectively. Outside of these restrictions, bit fields are packed together with no padding in
between. Bit fields are assigned in big-endian order, i.e., the first bit field occupies the most significant bits

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

8

Chapter 2. Lower-level Binary interfaces

while subsequent fields occupy lesser bits. Unsigned bit fields range from 0 to 2 –1 where “w” is the size in
bits. Signed bit fields range from −2w−1 to 2w−1 − 1. Plain int bit fields are unsigned. Bit fields impose
alignment restrictions on their enclosing structure or union. The fundamental type of the bit field (e.g.,
char, short, int) imposes an alignment on the entire structure. In the following example, the structure more
has 4-byte alignment and will have size of four bytes because the fundamental type of the bit fields is int,
which requires 4byte alignment. The second structure, less, requires only 1-byte alignment because that is
the requirement of the fundamental type (char) used in that structure. The alignments are driven by the
underlying type, not the width of the fields. These alignments are to be considered along with any other
structure members. Struct careful requires 4-byte alignment; its bit fields only require 1-byte alignment, but
the field fluffy requires 4-byte alignment.
struct more
{
int first : 3 ;
unsigned int second : 8 ;
};
struct less
{
unsigned char third : 3 ;
unsigned char fourth : 8 ;
};
struct careful
{
unsigned char third : 3 ;
unsigned char fourth : 8 ;
int fluffy ;
};

each field of structure or union starts on the next possible suitably aligned boundary for their data type.
For non-bit fields, this is a suitable byte alignment. Specially, bit field begin at the next available bit offset
with the following exception: the first bit field after a non-bit field member will be allocated on the next
available byte boundary. In the following example, the offset of the field “c” is one byte. The structure itself
has 4-byte alignment and is four bytes in size because of the alignment restrictions introduced by using the
“int” under- lying data type for the bit field.
struct s
{
int bf : 5;
char c;
};

This act behaves as same as the rules defined by UNIX System V Release 4 ABIs.

2.2 Function Calling Convention
2.2.1 Register Assignments
2.2.1.1 General Registers
In Table 2.4, showing the required register mapping for function calls. Some registers, such as the stack
pointer, have specific purposes, while others are used for local variables, or to transist function call arguments
and return values.
Certain registers are bound to their purpose because specific instructions use them. For instance, subroutine
call instructions write the return address into r15. The instructions used to save and restore registers on
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

9

Chapter 2. Lower-level Binary interfaces

entry and exit from a function use r14 as a base register, making it most appropriate for the stack pointer
register.
Reference to “Argument Passing“ and “Return Values” section for the detailed illustration of how
arguments are passed or how the compiler handle the return value.
Table 2.4: C-SKY V2.0 Register Assignment
Register Use Convention
Name
Software
name
r0-r1
a0-a1
r2-r3
a2-a3
r4-r11
l0-l7
r12-r13
t0-t1
r14
r15
r16-r17
r18-r25

sp
lr
l8-l9
t2-t9

r26
r27
r28

r26
r27
rdb/rgb

r29
r30
r31
pc

rtb
r30/svbr
tls
pc

hi

hi

lo

lo

Usage

Cross-Call Status

Argument Word 1-2/Return Address
Argument Word 3-4
Local
Temporary registers used for expression
evaluation
stack pointer
link
Local
Temporary registers used for expression
evaluation
Linker register
Assembler register
Data section base address /GOT based Address for PIC
Text section base address
Handler Base address
TLS register
Program counter can’t be accessed directly
by instructions
Multiply special register. Holds the most
significant 32 bits of multiply
Multiply special register. Holds the least
significant 32 bits of multiply

Destroyed
Destroyed
Preserved
Destroyed
Preserved
Preserved
Preserved
Destroyed
Reserved
reserved
reserved/Perserved
reserved
reserved
reserved
Destroyed
Destroyed

2.2.1.2 Float Point Registers
The C-SKY V2.0 provides instruction encodings to move, load, and store values for up to 16 co-processors.
Co-processor 1 adds 16 32/64/128-bit floating-point general registers for single / double / SIMD double.
Floating-point data representation is that specified in IEEE Standard for Binary Floating-Point Arithmetic,
ANSI/IEEE Standard 754-1985. Table 2.5 Registers describes the conventions for using the floating-point
registers.

Name
fr0
fr1-fr3
fr4-fr7
fr8-fr15

Release 2.1

Table 2.5: Float point Registers
Usage
Cross-Call status
Argument Word 1/Return Address Destroyed
Argument Word 2-4
Destroyed
Temporary registers
Destroyed
Local registers
Preserved

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

10

Chapter 2. Lower-level Binary interfaces

2.2.1.3 Cross-Call Lifetimes
The 32 general-purpose registers are split between those preserved and those destroyed across function calls.
This balances the need for callers to keep values in registers across calls against the need for simple leaf
subroutines to perform operations without allocating stack space and saving registers. The preserved registers
are called non-volatile registers. The registers that are destroyed are called volatile registers. Registers r4
through r7 are preserved because some 16-bit instructions can only access r0-r7 registers, so we can have a
high performance and code density with 16-bit instructions.
The called subroutine can use any of the argument and scratch registers without concerning for restoring
their values. Preserved registers must be saved before being used and restored before returning to the caller.
While the called function is not specifically required to save and restore r15. On entry to functionm r15
usually contains the return address, so that it’s value should be written into stack slot for making suring
that the program can find the target address after callee is finished. The caller must preserve any essential
data stored in argument and scratch registers. Data in these registers does not survive across function calls.
There is no register dedicated as a frame pointer. For non-alloca() functions, the frame pointer can always
be expressed as an offset from the stack pointer. For alloca() functions and functions with very large frames,
a frame pointer can be synthesized into one of the non-volatile registers.
Eliminating the dedicated frame pointer makes another register available for general use, with a corresponding
improvement in generated code. This affects stack tracing for debugging. See 2.3 Runtime Debugging
Support for additional information.

2.2.2 Stack Frame Layout
The stack pointer points to the bottom (low address) of the stack frame. Space at lower addresses than the
stack pointer is considered invalid and may actually be unaddressable. The stack pointer value must always
be a multiple of eight.
As the Stack Frame Layouts depicted, First() calls Second() which calls Third() shows typical stack frames
for three functions, indicating the relative position of local variables, parameters, and return address. The
outbound argument overflow must be located at the bottom (low address) of the frame. Any incoming
argument spill generated for vararg and stdarg processing must be at the top (high address) of the frame.
Space allocated by Alloca() must reside between the outbound argument overflow and local variable area.
The caller must store argument variables that do not fit in the argument registers in the outbound argument
overflow area. If all outbound arguments fit in registers, this area is not required. A caller may allocate a
succession of argument overflow space sufficient for the worst-case call, use portions of it as necessary, and
not change the stack pointer between calls. The caller must reserve stack space for return variables that
do not fit in the first two argument registers (e.g., structure returns). This return buffer area is typically
adjecent to the local variables. Note that only in the function return structure value, this space would be
allocated.
The caller may store the return address (r15) and the content of other local registers in the register save
area upon entry to the called subroutine. If a called routine does not modify local variables (including r15),
this area is not required.
Local variables that do not fit into the local registers are allocated in the Local Variable area of the stack. If
there are no such variables, this area is not required. Beyond these requirements, a routine is free to manage
its stack frame.
2.2.2.1 Extending the Stack
Stack maintenance is the responsibility of system software. In some environments, it may be benefitial for
compiler to probe the stack as they extend it in order to allow memory protection hardware to provide
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

11

Chapter 2. Lower-level Binary interfaces

Figure 2.2: Stack Frame Layouts
“guard pages”.

2.2.3 Argument Passing
The C-SKY V2 CPU uses four registers (r0–r3) to pass the first four words of arguments from the caller to
the called routine. If additional argument space is required, the caller is responsible for allocating this space
on the stack. This space (if needed by a particular caller) is typically allocated upon entry to a subroutine,
reused for each of the calls made from that subroutine that have more arguments than fit into the four
registers used for subroutine calls, and deallocated only at the caller’s exit point. All argument overflow
allocation and deallocation is the responsibility of the caller.
At entry to a subroutine, the first word of any argument overflow can be found at the address contained in
the stack pointer. Subsequent overflow words are located at successively larger addresses.
2.2.3.1 Scalar Arguments
Arguments are passed using registers r0 through r3, with no more than one argument assigned per register.
Argument values that are smaller than a 32-bit register occupy a full register.
In addition, small argument values are right justified and possibly extended within the register. Small
signed arguments (e.g., shorts) are sign extended; small unsigned arguments (e.g., unsigned shorts) are zero
extended, while other small values (e.g., structures of less than four bytes) are not extended, leaving the upper
bits of the register undefined. The caller is responsible for sign and zero extensions. Small arguments that
are passed via the argument overflow mechanism are placed in the overflow word with the same orientation
they would have if passed in a register; a char is passed in the low-order byte of an overflow word. Such
small overflow arguments need not be sign extended within the argument word as they would be if passed
in a register. Arguments larger than a register must be assigned to multiple argument registers as long as

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

12

Chapter 2. Lower-level Binary interfaces

there are argument registers available. Arguments that would be aligned on 4-byte boundaries in memory
(double, long double, long long, or structures or unions containing a double, long double or long long) can
begin in any numbered register. Once all the argument registers are used, or if there are not enough registers
left to hold a large argument, the argument and any subsequent arguments must be placed in the overflow
area described above.
Large arguments can be split in register and in the overflow area when there are too few argument registers
to hold the entire argument.
The caller is responsible for allocating argument overflow space and for deallocating any space needed for
argument overflow. The only argument space that may be allocated or deallocated by the called routine
is space used to place the register arguments in memory. This may be necessary for stdargs or structure
parameters. Alignment is forced for atomic data types; fundamental data types are not split.
2.2.3.2 Structure Arguments
Structures passed as arguments can be partially or wholly passed through the argument registers. A structure
argument may overflow onto the stack only when all argument registers are full. In these cases, the caller
must adjust the stack pointer to allocate theoverflow area.
Structure arguments that are smaller than 32 bits have their value right justified within the argument register.
The unused upper bits within the register are undefined.
Structure arguments larger than 32 bits are packed into consecutive registers. Structures that are not integral
multiples of 32 bits in size have their final bits left justified within the appropriate register. This allows those
bits to be stored with a 32-bit operation and be adjacent to the preceding portion of the structure.

2.2.4 Variable Arguments
The stdarg C macros provide with a mechanism to handle variable length argument lists. The caller might
not know whether the called function handles variable arguments, so the called routine is responsible for
handling the access to variable argument lists.
2.2.4.1 Spilling Register Arguments
Variable argument lists are most easily handled by spilling one or more of the register arguments so that
they are adjacent to any overflow arguments that are on the stack at function entry.
The typical sequence should extend the stack several words, spill the argument registers after the last named
argument into this space, and then proceed with the normal prologues to allocate a stack frame and save
any non-volatile registers. The stdarg macros can use the address of the first stored argument register for
the va_start macro. The va_arg macro advances this pointer by an amount appropriate to the size of the
type specified.
2.2.4.2 Legacy Code Compatibility
The C-SKY V2 CPU linkage convention provides with a way for variable argument lists to be handled in a
way that is compatible with legacy C code written for processors where the entire argument list is passed in
memory.
The legacy behavior might wastes more instructions, stack slots, and memory references than required by
strict interpretation of the ANSI C standards. Tool generators must provide with this legacy behavior as an
option. It is not required as a default behavior.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

13

Chapter 2. Lower-level Binary interfaces

To obtain compatibility, the called function must spill all the argument registers, rather than just those
beyond the registers that hold the named arguments. This is more pessimistic than required for the stdarg
definitions, but gain the most compatibility.
Spilling is triggered for functions that take the address of any of their arguments. This allows non-standard
varargs code (C code that works on processors with all arguments passed in memory) to run on the C-SKY
V2 CPU.
The spilled arguments are a snapshot of their values at the time the function is entered. This requirement
does not force the compiler to generate code that keeps the “live” value of the parameters in memory. For
example, the following would not be required to print out the value “4”.
void func(int a, int b, int c, ...)
{
int *ip = 0;
use(c);
ip = &b;
ip++;
*ip = 4;
printf("c now has value %d\n", c);
}

The compiler is free to keep the value of c in diffirent location, either register or stack slot. The only
requirement is to save a snapshot of the parameter passing registers (e.g., r0 through r3) during the function
prologue.

2.2.5 Return Values
2.2.5.1 Scalar Values
Subroutines return values in the argument registers. Return values smaller than 32 bits occupy a full register.
These must be right justified and zero or sign extended to 32 bits before return (refer to “Scalar Arguments”).
Return values of 32 bits or fewer are returned in register r0.
Return values between 33 and 64 bits are returned in the register pair r0/r1. The portion of the data
that would reside at a lower address if stored in memory is in r0. For example, r0 would contain the most
significant 32 bits of the long long data type.
Return values larger than eight bytes are treated as structure return values and are returned through memory.
The return value is placed in a caller-supplied buffer. The buffer address is passed from the caller to the
called routine as a hidden first argument in register r0.
2.2.5.2 Structure Values
Structures can be returned in one of two ways. Small structures (eight bytes or fewer) are returned in the
register pair r0/r1. If the structure consists of four or fewer bytes, the value is returned in r0, right justified.
This matches the way it would be justified when passed as an argument. If the structure consists of five to
eight bytes, the first four bytes are returned in r0 and the trailing portion of the structure is returned left
justified in r1.
This alignment is chosen to generate good code for code sequences such as
wom(..., bat(), ...)

where wom takes a structure argument of the same type returned by bat. The only work required is to
perhaps change registers if the call to wom has the structure in some place other than r0/r1.
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

14

Chapter 2. Lower-level Binary interfaces

Structures larger than eight bytes are placed in a buffer provided by the caller. The caller must provide with
a buffer with sufficient size. The buffer is typically allocated on the stack, in order to provide re-entrancy
and to avoid any race conditions where a static buffer may be overwritten. The address of the buffer is
passed to the called function as a hidden first argument and assigned in register r0. The normal arguments
start in register r1 instead of in r0, restricted by as same constraints as fundamental data type.
The caller must provide this buffer for large structures even when the caller does not use the return value
(e.g., the function was called to achieve a side-effect). The called routine can thus assume that the buffer
pointer is valid and need not validate the pointer value passed in r0.
When r0 is used to pass a buffer address, the called routine must preserve the value passed through r0. The
caller can thus assume that r0 is preserved when the buffer address of a large structure is passed in r0. This
is similar to the way where strcat and memcpy return their respective destination addresses.
In generaly, the temporary buffer, used for such structure returns, is immediately used as a source for a
memcpy to a final destination. For example, the sequence
struct s {...}s, sfunc();
s = sfunc();

will often be compiled with sfunc returning into a temporary buffer, which is immediately copied into s.
Although the caller must know the address of the temporary buffer so as to supply it for the called routine,
the address need not be recalculated. In turn, the called routine can use the address to copy the results into
the temporary buffer using memcpy, which returns the destination address (e.g., r0 has the desired value),
or passes it to in-line code which uses r0 as a base register.

2.3 Runtime Debugging Support
It is one of the most difficult for C-SKY V2 CPU to trace stack. Tracing is complicated because the linkage
convention does not mandate a frame pointer register and does not provide with any back-chain construct.
This section describes rules for generating function prologues that can be easily decoded by a debugger
to determine the size of a stack frame, the location of the return address, and the location of any saved
non-volatile registers.

2.3.1 Function Prologues in C-SKY V2.0
Function prologues acquire stack space needed by the function to store local variables. This includes space
the function uses to save non-volatile registers. Prologue instruction sequences can take a number of forms.
A set of working assumptions about function prologues follows.
The function prologue is the only place in the function that acquires stack space, other than later calls to
alloca().
The function prologue uses only the following classes of instructions.
subi sp, imm (Note that this might appear multiple times in a prologue)
subi sp, rx
push
st.w rx, (sp, disp)
mov rn, sp

This is optional support for traceback through alloca() using functions, and also marks the final instruction
in the prologue.
The function prologue is organized roughly as:

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

15

Chapter 2. Lower-level Binary interfaces

• If stdarg, acquire space to store volatile registers; store volatile registers.
• Acquire space to store non-volatile registers.
• Store non-volatile registers that may be modified in this function.
• Acquire any additional stack space required. This space acquisition might be folded in with earlier
ones if the total space allocated is no more than 32 bytes.
• If needed in this function, copy the stack pointer into one of the non-volatile registers to act as a frame
pointer.
• Larger frames should allocate the register save space and then allocate the remainder of the required
stack space rather than perform a single large stack acquisition. If the stack is acquired in a single
allocation before the non-volatile registers are saved, then another base register is needed to reach the
location for the stored registers. The prologue recognition code in the debugger does not recognize
using alternate base registers to store the non-volatile registers as being part of the prologue.
This sequence allows the stack pointer to be modified several times.

2.3.2 Stack Tracing
Stack tracing for the C-SKY V2 CPU depends on the ability to determine the entry point for a function,
given a PC value in that function. Since there are no unique prologue-only patterns in the instruction stream
that can be identified by scanning backwards from the current PC. So a symbol table for the executable file
must be present. The symbols need not be complete DWARF information.
Placing a specific byte pattern just before the prologue is not sufficient to identify the beginning of a function
because the pattern can also appear within the body of the function as part of a literal table. In code-size
sensitive environments, the extra space consumed by such a byte pattern is undesirable.
The stack tracing code iteratively performs the following:
1. Get the current PC.
2. Find the beginning of the containing function. Stop if this can’t be determined.
3. Decode the prologue starting at the function’s entry.
4. Determine the “top of frame” from the framesize information described in the pro- logue. This is
either an adjustment to the stack pointer or a “pseudo-frame pointer” if the prologue ends with a
frame pointer generating instruction.
5. Recover stored non-volatile registers based on the offsets described in the prologue. Repeat for the
next frame.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

16

CHAPTER

3

High language Issures

This chapter would be divided into several sections to be illustrated as follows.
• C preprocessor predefinitions
• Inline assembly syntax
• Name mapping

3.1 C preprocessor predefinitions
All C language compilers must predefine such symbol related to C-SKY CPU, __CKCORE__ ,
__CSKY__ , and __csky__ with the value “1” to indicate that the compiler targets the C-SKY V1.0 processor, and the value “2” to indicate that the compiler targets the C-SKY V2.0 processor. __CSKYABI__
, __cskyabi__ with the value “1” to indicate that the compiler targets the C-SKY ABI V1.0, and the value
“2” to indicate that the compiler targets the C-SKY ABI V2.0.
When big endian was configured in target machine, all C language compilers must predefine the symbol
__BIG_ENDIAN__ , or symbol __LITTLE_ENDIAN__ .

3.2 Inline assembly syntax
3.2.1 Overview
When developing for the special applications or taking the advantage of recently advanced instructions
which temporally can’t be generated by compiler, it is needed to cast our sight to the assembly language.
With assisttant of assembly code, developer can operate the lower level registers or instructions. This is
machenism named of Inline Assembly provieded by GNU extension to normal C standard. Also, C-SKY
compiler supports this benefitial feature based on GCC(GNU compiler collection).

17

Chapter 3. High language Issures

Inline assembly is important primarily because of its ability to operate and make its output visible on C
variables. Because of this capability, “asm” works as an interface between the assembly instructions and the
“C” program that contains it.

3.2.2 Basic usage
format of basic inline assembly is very much straight forward. Its basic form is,
asm("assembly");

Example for C-SKY V2.0 is as follow.
/* move content of r1 to r0. */
asm("mov r0, r1");
/* move 0x2 to r2. */
__asm__("movi r2, 0x");

You might have noticed that here I’ve used asm and __asm__. Both are valid. We can use __asm__
if the keyword asm conflicts with something in our program. If we have more than one instructions, we
write one per line in double quotes, and also suffix a ’n’ and ’t’ to the instruction, since compiler sends each
instruction as a string to assembler and by using the newline/tab we send correctly formatted lines to
the assembler. The exmaple used for illustrating this as follows.
__asm__ ("mov r8, r0\n\t"
"mov r1, r9\n\t"
"stw r1, (r8,4)\n\t");

If in our code we touch (ie, change the contents) some registers and return from asm without fixing those
changes, something bad is going to happen. This is because compiler have no idea about the changes in
the register contents and this leads us to trouble, especially when compiler makes some optimizations. It
will suppose that some register contains the value of some variable that we might have changed without
informing compiler, and it continues like nothing happened. What we can do is either use those instructions
having no side effects or fix things when we quit or wait for something to crash. This is where we want some
extended functionality. Extended asm provides us with that functionality.

3.2.3 Extended asm
In basic inline assembly, we had only instructions. In extended assembly, we can also specify the operands. It
allows us to specify the input registers, output registers and a list of clobbered registers. It is not mandatory
to specify the registers to use, we can leave that head ache to compiler and that probably fit into compiler’s
optimization scheme better. Anyway the basic format is.
asm ( assembler template
: output operands
: input operands
: list of clobbered registers
);

/* optional */
/* optional */
/* optional */

The assembler template consists of assembly instructions. Each operand is described by an operandconstraint string followed by the C expression in parentheses. A colon separates the assembler template
from the first output operand and another separates the last output operand from the first input, if any.
Commas separate the operands within each group. The total number of operands is limited to ten or to the
maximum number of operands in any instruction pattern in the machine description, whichever is greater.
If there are no output operands but there are input operands, you must place two succensive colons as the
placeholder at where the output operands would go. For instance,
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

18

Chapter 3. High language Issures

asm ("cmpei
%0, 0\n\t"
"bt 1\n\t"
"stw %0, (%1, 0)"
"1:\n\t"
: /* no output registers */
: "r" (count), "r"(dest)
: "memory"
);

The above inline fills if :math: count!=0, store count into the memory which dest point to. It also inform
compiler the contents of memory is changed. The following example will be served as role for expositing it
more clearer.
int a=10, b;
asm ("mov r1,
mov %0,
:"=r"(b)
:"r"(a)
:"r1"
);

%1
r1"
/* output */
/* input */
/* clobbered register */

Here what we did is taking the value of ‘a’ from ‘b through using assembly instructions. Some interesting
points are as follows.
• “b” is the output operand, referred to by %0 and “a” is the input operand, referred to by %1.
• “r” is a constraint on the operands. We’ll see constraints in detail later. For the time being, “r” says
to COMPILER to use any register for storing the operands. output operand constraint should have a
constraint modifier “=”. And this modifier says that it is the output operand and is write-only.
• There are two %’s prefixed to the register name. This helps COMPILER to distinguish between the
operands and registers. operands have a single % as prefix.
• The clobbered register r1 after the third colon tells compiler that the value of r1 would to be modified
inside “asm”, so compiler shouldn’t use this register to store any other value.
When the execution of “asm” is complete, “b” will reflect the updated value, as it is specified as an output
operand. In other words, the change of “b” inside “asm” is supposed to be reflected outside the “asm”.
3.2.3.1 Assembler Template
This section will uses some detailed description to explain the inline assembly grammar, e.g. either each
instruction in inline assembly or all instructions respectively enclosed by double quotes. Also, each instruction
should end with a delimiter, for instance, newline(n) or semicolon(;), ’n’ may be followed by a tab(t).
Operands corresponding to the C expressions are represented by %0, %1 … etc.
3.2.3.2 Operands
C expressions serve as a role for giving operands for the assembly instructions inside “asm”. Each operand
is written as first an operand constraint in double quotes. For output operands, there’ll be a constraint
modifier also within the quotes and then follows the C expression which stands for the operand.
“constraint” (C expression) is the general form. For output operands an additional modifier will be there.
Constraints are primarily used to decide the address mode for operands. They are also used for specifying
how the registers would be used.
If there are more than one operands, a comma should be introduced to separate them.
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

19

Chapter 3. High language Issures

In the assembler template, each operand is referenced by number. We might use following rule to number
all operands(including input operands and output operands). By assuming there are n operands, then the
number of each output operand will be numbered as zero with step 1 in ascending order, and the last input
operand is numbered as n-1.
Unlike input operands are not restricted, output operand expressions must be values. They may be expressions. The extended asm feature is usually used for machine instructions which the compiler itself does not
know as existing ;-). If the output expression cannot be directly addressed (for example, it is a bit-field),
our constraint must allow a register. In that case, compiler will use the register as the output of the asm,
and then store that register contents into the output.
As stated above, ordinary output operands must be write-only; compiler will assume that the values in
these operands before the instruction are dead and need not be generated. Extended asm also supports
input-output or read-write operands.
So now we can concentrate on some examples. We want to add a number by 5. For that we use the instruction
add.
asm ("mov %0, %1\n\t"
"cmplt %0, %0\n\t"
"addc %0, 5"
: "=r" (five_times_x)
: "r" (x)
);

Here our input is in ’x’. We didn’t specify which register to be used. compiler will choose some register for
input, one for output and does what we desired. If we want the input and output to reside in the same
register, we can tell compiler how to do so. Here we use those types of read-write operands. By specifying
proper constraints, here we do it.
asm ("cmplt %0, %0\n\t"
"addc %0, 5"
: "=r" (five_times_x)
: "0" (x)
);

Now the input and output operands are reside in the same register. But we don’t know which register.
In all the two examples above, we didn’t put any register to the clobber list. why? In the first two examples,
COMPILER decides the registers and it knows what changes happen.
3.2.3.3 Clobber List
Some instructions clobber some hardware registers. We have to list those registers in the clobber-list, ie the
field after the third ’:’ in the asm function. This is to inform compiler that we will use and modify them
ourselves. So compiler will not assume that the values it loads into these registers will be valid. We shouldn’t
list the input and output registers in this list. Because, compiler knows that “asm” uses them (because they
are specified explicitly as constraints). If the instructions use any other registers, implicitly or explicitly (and
the registers are not present either in input or in the output constraint list), then those registers have to be
specified in the clobbered list.
If our instruction can alter the condition code register, we have to add “cc” to the list of clobbered registers.
If our instruction modifies memory in an unpredictable fashion, add “memory” to the list of clobbered
registers. This will cause compiler to not keep memory values cached in registers across the assembler
instruction. We also have to add the volatile keyword if the memory affected is not listed in the inputs or
outputs of the asm.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

20

Chapter 3. High language Issures

We can read and write the clobbered registers as many times as we like. Consider the example of multiple
instructions in a template; it assumes the subroutine _foo accepts arguments in registers r1 and r2.
asm ("movl r2, %0 \n\t
movl r3, %1 \n\t
jsri _foo"
: /* no outputs */
: "g" (from), "g" (to)
: "r2", "r3"
);

3.2.3.4 Volatile
If you are familiar with kernel sources or some beautiful code like that, you must have seen many functions
declared as volatile or __volatile__ which follows an asm or __asm__.
If our assembly statement must execute where we put it, (i.e. must not be moved out of a loop as an
optimization), putting the keyword volatile after asm and before the ()’s. So as to keep it from moving,
deleting and all, we declare it as.
asm volatile ( ... : ... : ... : ...);

Use __volatile__ when we have to be very much careful.
If our assembly is just for doing some calculations and doesn’t have any side effects, it’s better not to use
the keyword volatile. Avoiding it helps compiler in optimizing the code and making it more beautiful.
In the section Some Useful Recipes, there are many examples for inline asm functions. There we can see the
clobber-list in details.
3.2.3.5 Constraints
Constraints can say whether an operand may be in a register; whether the operand can be a memory
reference, and which kinds of address; whether the operand may be an immediate constant, and which
possible values (ie range of values) it may have…. etc.
There are a number of constraints in which few parts are used frequently. We’ll have a look at those
constraints.
1. Register operand constraint
When operands are specified using this constraint, they get stored in General Purpose Registers(GPR). Take
the following as an example:
asm ("mov %0, %1\n"
:"=r"(myval)
: "=r"(inval));

Here, the variable myval is kept in a register, and the value in inval is copied onto that register. When
the “r” constraint is specified, compiler may keep the variable in any of the available GPRs. To specify the
register, you must directly specify the register name via using specific register constraints. They are:
For example:
__asm__ __volatile__ ("mthi %1"
:"=h"(j)
:"r"(i));

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

21

Chapter 3. High language Issures

2. Memory opernad contraint(m)
When the operands are preversed in the memory, any operations operated on them will occur directly in the
memory location, as opposed to register constraints, which first store the value into a register to be modified
and then write it back to the stack slot. But register constraints are usually used only when it is absolutely
necessary for them to significantly speed up the process. Memory constraints can be used most efficiently
in cases where a C variable needs to be updated inside “asm” and you really don’t want to use a register to
hold its value. For example, the value of input is stored in the memory location(loc):
3. Matching constraints
In some cases, a single variable may serve as both the input and the output operand. Such cases may be
specified in “asm” by using corresponding constraints.
asm ("inct %0" :"=a"(var):"0"(var));

This constraint can be used on following scenario:
• In cases where input is read from a variable or the variable is modified and modification is written
back to the same variable.
• In cases where separate instances of input and output operands are not necessary.
Using of corresponding of constraints would have significant impact on efficient use of available registers.
By using constraints, for more precise control over the effects of constraints, compiler will provides us
with constraint modifiers. Mostly used constraint modifiers are listed as below.
• “=” means that this operand is write-only for this instruction. But, note that previous value is
discarded and replaced by output data.
• “&” means that this operand is an early clobber operand, which is modified before the instruction is
finished using the input operands. Therefore, this operand may not lie in a register that is used as an
input operand or as part of any memory address. An input operand can be tied to an early clobber
operand if its only use it as an input before the early result is broken.

3.2.4 Examples
• addition of two integer
int main(void)
{
int foo = 10, bar = 15;
__asm__ __volatile__(“cmplt
%1, %1\n\t”
"addc %1,%2"
:"=a"(foo)
:"0"(foo), "b"(bar));
printf("foo+bar=%d\n", foo);
return 0;
}

The ‘=’ sign indicates the output register.
__asm__ __volatile__("addu
%0,%1\n"
: "=m" (my_var)
: "ir" (my_int), "m" (my_var)
: /* no clobber-list */);

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

22

Chapter 3. High language Issures

In the output field, “=m” says that my_var is an output operand and resides in memory. Similarly,
“ir” says that, my_int is integral and should reside in some register (recall the table we saw above).
No registers are in the clobber list.
• Memory access
int main(int argc, char **argv)
{
int i;
char kk[10]
char ch;
__asm__ __volatile__ ("ldw %0, %1"
: "=r"(i)
: "m"(argc));
__asm__ __volatile__ ("stw %1, %0"
: "=o"(kk)
: "r"(i));
__asm__ __volatile__ ("stw %0, %1"
: "=r"(i)
: "V"(argc));
__asm__ __volatile__ ("stw %1, %0"
: "=m"(kk[5])
: "r"(ch));
return 0;
}

• Linux System Calls
ON Linux platform, system calls are implemented using inline assembly. All the system calls are
written as macros. For example, a system call with 1 arguments is defined as a macro as shown below.
#define _syscall1(type, name, atype, a)
type name(atype a)
{
register long __name __asm__("r1") = __NR_##name;
register long __res __asm__("r2") = a;
__asm__ __volatile__ ("trap 0\n\t"
: "=r" (__res)
: "r" (__name),
"0" (__res)
: "r1", "r2");
if ((unsigned long)(__res) >= (unsigned long)(-125))
{
*__errno_location () = -__res;
__res = -1;
}
return (type)__res;
}

Whenever a system call with 1 arguments occurs, the macro shown above is used for executing the
specified function call. After call finishing, the syscall number is placed in r1, then each parameters in
r2. And finally “trap 0” is the instruction which makes the system call work. The return value can be
collected from r2.
Note
“__errno_location()” is a function call, and will return the result in r2, and function call for
CKCORE will clobber r1 – r7, but “register long __res __asm__(“r2”)” use “r2” also, so there

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

23

Chapter 3. High language Issures

is a bug in the above example, It must be:
{
long __error = __res;
*__errno_location () = -__error;
__res = -1;
}

3.3 Name mapping
Externally visibility names a specified name in the C language must be mapped through to assembly language
without change. We will use following example to illustrate this point.
void testfunc() { return;}

it will generates assembly code similar to the following fragment.
testfunc:
rts

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

24

CHAPTER

4

ELF file format

C-SKY V2 CPU tools use ELF object file formats(1.2 version) and DWARF 2.0 debugging information
formats, as described in System V Application Binary Interface, from The Santa Cruz Operation, Inc. ELF
and DWARF provide a suitable basis for representing the information needed for embedded applications.
This section describes particular fields related to the ELF and DWARF formats that differ from the basic
standards for those format.
This chapter will introduces several sections to exposite the ELF file format in detail.
• ELF Header
• Section Layout
• Symbol Table Format
• Relocation Information Format
• Program Loading
• Dynamic Linking
• PIC Examples
• Debugging Information Format

4.1 ELF Header
• e_machine
The e_machine field of the ELF header contains the decimal value 39 (hexadecimal 0x27) which is
named EM_CSKY.
• e_ident
For file identification in e_ident[] must be the values listed in Table 4.1.

25

Chapter 4. ELF file format

Table 4.1: C-SKY e_Ident Fields
C-SKY e_Ident Fields
eident[EICLASS]
eident[EIDATA]

ELFCLASS32
ELFDATA2LSB
ELFDATA2M SB

or

For all 32 bit implementations
The choice will be governed by the default data order in the execution evironment. ELFDATALSB:
Little Endian ELFDATA2MSB: Big Endian

• e_flags
In ABI v0.1, the ELF header e_flags member contains zero, because the C-SKY processor family
defines no flags at that time. Now e_flags are shown in Table 4.2. Undesignated bits are reserved to
future revisions of this specification.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

26

Chapter 4. ELF file format

Name
EF_CSKY_ABIMASK

Table 4.2: C-SKY-Specified e_flags
Mask
Value-Meaning
0xF0000000
The integer value formed by these 8
bits identify extensions to the C-SKY
A BI V0.1; In ABI V0.1, the ELF
header e_flags member contains zero,
because the C-SKY processor family
defines no flags at that time; values
> 0 indicates the object file or executbale contains program text using
newer version of CSKY-ABI than CSKY ABI V0.1
0b0000: V0.1
0b0001: V1.0
0b0010: V2.0
…

Other information

0x0FFF0000
EF_CSKY_PIC

0x00010000

EF_CSKY_CPIC

0x00020000

Reserved
EF_CSKY_PROCESSOR 0x0000FFFF

0x0FFC0000

Other information
This bit is asserted when target file
contains posi tion independent code
that can be relocated in memory
This bit is asserted when target file
contains code that follows standard
calling convention for calling PIC. It’s
not necessarilly position independent
for object code. The EF_CSKY_PIC
and EF_CSKY_CPIC flag can only
be used exclusively.
Reserved
This integer consists of 8 bits, which
used for identifing the instruction set
version as follows.
(1<<0): CK510
(1<<1): CK610
(1<<2): CK801
(1<<3): CK810
…
(1<<14): DSP V1.0
(1<<15): MAC set

4.2 Section Layout
4.2.1 Section Alignment
The object generator (compiler or assembler) supplyes alignment information for the linker. The default
alignment is eight bytes. Object producers must ensure that generated objects specify required alignment.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

27

Chapter 4. ELF file format

For example, an object file must reflect the fact that four-byte alignment is required in the data section.

4.2.2 Section Attributs
Table 4.3 defines section attributes that are available for C-SKY V2 CPU tools. These attributes are
additions to the ELF standard flags shown in Table 4.4.
Table 4.3: CKCORE Section Attribute Flags
CKCORE Section Attribute Flags
Name
Value
SHF_CKCORE_NOREAD 0x80000000
The SHF_CKCORE_NOREAD attribute allows the specification of code that is executable but not readable. Plain ELF assumes that all segments have read attributes, which is why there is no read permission
attribute in the ELF attribute list. In embedded applications, “execute-only” sections that allow hiding the
implementation are often desirable.
Table 4.4: ELF Section Attribute Flags
ELF Section Attribute Flags
Name | Value
SHF_WRITE
0x00000001
SHF_ALLOC
0x00000002
SHF_EXECINSTR 0x00000004

4.2.3 Special Sections
Various sections hold program and control information. Table 4.4 shows sections used by the system, the
indicated types, and attributes. These are additional extensions to ELF standards shown in Table 4.5. The
ELF standard reserves section names beginning with a period (“.”), but applications may use those sections
if their existing meanings are satisfactory.
C-SKY currently support PIC technique, when compiling PIC, the link editor will create .got and .plt
sections, see “ Global Offset Table “ and “ Procedure Linkage Table “.
Table 4.5: C-SKY V2 CPU Tools Special Sections
C-SKY Section names for PIC
Name
Type
Attributs
.got
SHT_PROGBITS SHF_ALLOC+SHF_WRITE
.plt
SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR
Note
It is strongly recommended that read-only constants, such as string literals, would to be placed into the
.rodata section instead of the .text section. The space that these add to .text can have a severe impact on
addressability, requiring the use of larger branch instructions and reducing the chances for sharing of values
in literal tables.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

28

Chapter 4. ELF file format

Table 4.6: ELF Reserved Section Names
ELF Reserved Section Names
Name
Type
Attributes
.bss
SHT_NOBITS
SHF_ALLOC+SHF_WRITE
.comment SHT_PROGBITS none
.data
SHT_PROGBITS SHF_ALLOC+SHF_WRITE
.data1
SHT_PROGBITS SHF_ALLOC+SHF_WRITE
.debug
SHT_PROGBITS none
.dynamic
SHT_DYNAMIC
–
.dynstr
SHT_STRTAB
SHF_ALLOC
.dynsym
SHT_DYNSYM
SHF_ALLOC
.fini
SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR
.hash
SHT_HASH
SHF_ALLOC
.init
SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR
.interp
SHT_PROGBITS –
.line
SHT_PROGBITS none
.note
SHT_NOTE
none
.rel*
SHT_REL
–
.rela*
SHT_RELA
–
.rodata
SHT_PROGBITS SHF_ALLOC
.rodata1
SHT_PROGBITS SHF_ALLOC
.shstrtab
SHT_STRTAB
none
.strtab
SHT_STRTAB
–
.symtab
SHT_SYMTAB
–
.text
SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR

4.3 Symbol Table Format
There are no C-SKY V2 CPU symbol table requirements beyond the base ELF standards.

4.4 Relocation Information Format
4.4.1 Reclocation Fields
Relocation entries describe how to alter the instruction and data relocation fields as shown in Table 4.7. The
choice of the relocation type numbers as encoded in the ELF object file is defined in Table 4-8 Relocation
Type Encodings.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

29

Chapter 4. ELF file format

Table 4.7: Relocation Fields
Field
word32
disp8

disp11

disp26

disp16

disp10

word_hi16

word_lo16

gb_disp_hi16

gb_disp_lo16

gb_offset_hi16

gb_offset_lo16

disp12

gb_got_hi16
Release 2.1
gb_got_lo16

Description
This specifies a 32-bit field occupying four bytes. This address is
NOT required to be 4-byte aligned.
This corresponds to the scaled 8-bit displace ment addressing mode.
The relocation is the low-order 8 bits of the 16 bits addressed in
the relocation type. jsri, jmpi, & lrw use this 8-bit displacement
addressing mode.
This corresponds to the scaled 11-bit displac ement addressing
mode. The relocation is the low-order 11 bits of the 16 bits addressed in the relocation type. br, bf, bt & bsr use this 11-bit
displacement addressing mode.
This corresponds to the scaled 26-bit displa cement addressing
mode. The relocation is the low-order 26 bits of the 32 bits addressed in the relocation type. bsr use this 26-bit displacement
addressing mode.
This corresponds to the scaled 16-bit displacement addressing mode.
The relocation is the low-order 16 bits of the 32 bits addressed in
the relocation type. br,be, bne, bez, bnez, bhz, blsz, bhsz, bt, bf,
jmpi, jsri use this 16-bit displacement addressing mode.
This corresponds to the scaled 10-bit displacement addressing mode.
The relocation is the low-order 10 bits of the 16 bits addressed in the
relocation type. br, bsr, bt, bf use this 10-bit displacement address
ing mode.
This corresponds to the most significant 16 bits in the 32 bits value
of the symbol referred by movih, addi, subi, andi, andni, ori, xori,
pldr, pldw, cmphsi, cmplti, cmpnei, movi instruction. To calculate
symbol value = (word_hi16 << 16 | word_lo16)
This corresponds to the least significant 16 bits in the 32 bits value
of the symbol referred by movih, addi, subi, andi, andni, ori, xori,
pldr, pldw, cmphsi, cmplti, cmpnei , movi instruction. To calculate
symbol value = (word_hi16 << 16 | word_lo16)
This corresponds to the most significant 16 bits in the 32 bits value
of the (GOT Base - pc) referred by movih, addi, subi, andi, andni,
ori, xori, pldr, pldw, cmphsi, cmplti, cmpnei, movi instruction. To
calculate GOT Base = (gb_disp_hi16 << 16 gb_disp_lo16) + pc
This corresponds to the least significant 16 bits in the 32 bits value
of the (GOT Base - pc) referred by movih, addi, subi, andi, andni,
ori, xori, pldr, pldw, cmphsi, cmplti, cmpnei, movi instruction. To
calculate GOT Base = (gb_disp_hi16 << 16 gb_disp_lo16) + pc
This corresponds to the most significant 16 bits in the 32 bits value
of the (GOT Base – Symbol value) referred by movih, addi, subi,
andi, andni, ori, xori, pldr, pldw, cmphsi, cmplti, cmpnei, movi
instruction. To calculate symbol value = gb - (gb_offset_hi16 <<
16 | word32_lo16)
This corresponds to the most significant 16 bits in the 32 bits value
of the (GOT Base – Symbol value) referred by movih, addi, subi,
andi, andni, ori, xori, pldr, pldw, cmphsi, cmplti, cmpnei, movi
instruction. To calculate symbol value = gb - (gb_offset_hi16 <<
16 | word32_lo16)
This corresponds to the scaled 12-bit displacement addressing mode.
The relocation is the low-order 12 bits of the 32 bits addressed in
the relocation type. ld/st use this 12-bit displacement addressing
mode.
This corresponds to the most significant 16 bits in the 32 bits value
Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.
of the entry index in GOT referred by movih, addi, subi, andi, andni,
ori, xori, pldr, pldw, cmphsi, cmplti, cmpnei, movi instruction.
his corresponds to the least significant 16 bits in the 32 bits value

CPU
all
V1.0

V2.0 32-bit

V2.0 32-bit

V2.0 16-bit

V2.0 16-bit

V2.0 32-bit

V2.0 32-bit

V2.0 32-bit

V2.0 32-bit

V2.0 32-bit

V2.0 32-bit

V2.0 32-bit

V2.0 32-bit

V2.0 32-bit

30

Chapter 4. ELF file format

The object file supports the 32-bit relocations for 32-bit data (addressing constants in memory). Both
absolute and PC-relative relocations are defined.
Note that the 32 bits where the relocation is to be applied need not be on a 32-bit boundary. The relocation
entry points to the address of the 32 bits to be adjusted by the relocation entry. The relocation adds the
appropriate value (either the 32-bit value or the 32-bit displacement) to the existing contents of the 32 bits
at that address.
A packed data structure can cause a 32-bit relocation to be misaligned in the object file. This might be
done with a C compiler extension, or by means of hand-crafted assembly, in order to save data space (but
the misaligned data must be accessed piece-wise to avoid alignment exceptions). The linker must be able to
deal with this case.
Scaled 11-bit displacement mode is used in br, bf, bt, and bsr instructions. The 11-bit value indicates
the number of halfwords from PC+2 to the target address. The relocation entry must point to the 16-bit
instruction that contains the displacement.
Calculations below assume the actions are transforming a relocatable file into either an executable or a
shared object file. Conceptually, the linker merges one or more relocatable files to form the output. It
first determines how to combine and locate the input files; then it updates the symbol values, and finally it
performs the relocation.
Relocations applied to executable or shared object files are similar and accomplish the same result. Descriptions below use the following notation.
A
This means the addend used to compute the value of the relocatable field.
B
This means the base address at which a shared object has been loaded into memory during
execution. Generally a shared object file is built with a 0 base virtual address, but the execution
address will be different.
BTEXT
This means the base address of .text section at which an elf file has been loaded into memory
during execution. Generally an elf file is built with a 0 base virtual address, but the execution
address will be different.
BDATA
This means the base address of .data section at which an elf file has been loaded into memory
during execution. Generally an elf file is built with a 0 base virtual address, but the execution
address will be different.
P
This means the place (section offset or address) of the storage unit being relocated (computed
using r_offset).
S
This means the value of the symbol whose index resides in the relocation entry, unless the the
symbol is STB_LOCAL and is of type STT_SECTION in which case S represents the original
sh_addr minus the final sh_addr.
G
In C-SKY V1.0 this means the offset into the global offset table at which the address of the
relocation entry symbol resides during execution. In C-SKY V2.0 this means the index into the

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

31

Chapter 4. ELF file format

global offset table at which the address of the relocation entry symbol resides during execution.
See ‘‘PIC Examples’’ and ‘‘Global Offset Table’’ for more information.
GOT
This means the address of the global offset table. See “Global Offset Table”
L
This means the place(section offset or address) of the procedure linkage table entry for a symbol.
A procedure linkage table entry redirects a function call to the proper destination. The link
editor builds the initial procedure linkage table, and the dynamic linker modifies the entries
during execution. See “Procedure Linkage Table” below for more information.
A relocation entry r_offset value designates the offset or virtual address of the first byte of the affected
storage unit. The relocation type specifies which bits to change and how to calculate their values. Because
C-SKY V2 CPU uses only Elf32_Rela relocation entries, the relocated field does not hold the addend, but
relocation entry holds it.

4.4.2 Relocation Types
This section describes values and algorithms used for relocations. In particular, it describes values the
compiler/assembler must leave in place and how the linker mod- ifies those values.
Table 4.8 shows semantics of relocation operations. Key S indicates the final value assigned to the symbol
referenced in the relocation record. Key A is the addend value specified in the relocation record. Key P
indicates the address of the relocation (e.g., the address being modified).
Table 4.8: Relocation Type Encodings
Name
Value Field
Calculation
R_CKCORE_NONE
0
none
none
R_CKCORE_ADDR32
1
word32
S+A
R_CKCORE_PCREL_IMM8BY4
2
dis8
((S+A-P)>>2)&&0xff
R_CKCORE_PCREL_IMM11BY2
3
disp11
((S+A-P)>>1)&0x7ff
R_CKCORE_PCREL_IMM4BY2
4
none
unsupported, deleted
R_CKCORE_PCREL32
5
word32
S+A-P
R_CKCORE_PCREL_JSR_IMM11BY2
6
disp11
((S+A-P)>>1)&0x7ff
R_CKCORE_GNU_VTINHERIT
7
??
R_CKCORE_GNU_VTENTRY
8
??
R_CKCORE_RELATIVE
9
word32
B+A
R_CKCORE_COPY
10
none
none
R_CKCORE_GLOB_DAT
11
word32
S
R_CKCORE_JUMP_SLOT
12
word32
S
R_CKCORE_GOTOFF
13
word32
S + A - GOT
R_CKCORE_GOTPC
14
word32
GOT+A-P
R_CKCORE_GOT32
15
word32
G
R_CKCORE_PLT32
16
word32
G
R_CKCORE_ADDRGOT
17
word32
GOT+G
R_CKCORE_ADDRPLT
18
word32
GOT+G
R_CKCORE_PCREL_IMM26BY2
19
disp26
((S+A–P)>>1)&0x3ffffff
R_CKCORE_PCREL_IMM16BY2
20
disp16
((S+A-P)>>1)&0xffff
R_CKCORE_PCREL_IMM16BY4
21
disp16
((S+A-P)>>2)&0xffff
R_CKCORE_PCREL_IMM10BY2
22
disp10
((S+A-P)>>1)&0x3ff
R_CKCORE_PCREL_IMM10BY4
23
disp10
((S+A-P)>>2)&0x3ff
Continued
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

I_SET
ALL
ALL
V1.0
V1.0
None
??
V1.0
??
??
ALL
ALL
ALL
ALL
V1.0
V1.0
V1.0
V1.0
V1.0 32-bit
V1.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 16-bit
V2.0 16-bit
on next page

32

Chapter 4. ELF file format

Table 4.8 – continued from previous page
Name
Value Field
Calculation
R_CKCORE_ADDR_HI16
24
word_hi16
((S+A)>>16)&0xffff
R_CKCORE_ADDR_LO16
25
word_lo16
(S+A)&0xffff
R_CKCORE_GOTPC_HI16
26
gb_disp_hi16
((GOT+A-P)>16)&0xffff
R_CKCORE_GOTPC_LO16
27
gb_disp_lo16
(GOT+A-P)&0xffff
R_CKCORE_GOTOFF_HI16
28
gb_offset_hi16 ((S+A-GOT) >> 16) & 0xffff
R_CKCORE_GOTOFF_LO16
29
gb_offset_lo16 (S+A-GOT) & 0xffff
R_CKCORE_GOT12
30
disp12
G
R_CKCORE_GOT_HI16
31
gb_got_hi16
(G >> 16) & 0xffff
R_CKCORE_GOT_LO16
32
gb_got_lo16
G & 0xffff
R_CKCORE_PLT12
33
disp12
G
R_CKCORE_PLT_HI16
34
gb_got_hi16
(G >> 16) & 0xffff
R_CKCORE_PLT_LO16
35
gb_got_lo16
G & 0xffff
R_CKCORE_ADDRGOT_HI16
36
gb_got_hi16
(GOT+G*4)& 0xffff
R_CKCORE_ADDRGOT_LO16
37
gb_got_lo16
(GOT+G*4) & 0xffff
R_CKCORE_ADDRPLT_HI16
38
gb_got_hi16
((GOT+G*4) >> 16) & 0xffff
R_CKCORE_ADDRPLT_LO16
39
gb_got_lo16
(GOT+G*4) & 0xffff
R_CKCORE_PCREL_JSR_IMM26BY2
40
disp26
((S+A–P)>>1)&0x3ffffff
R_CKCORE_TOFFSET_LO16
41
disp16
(S+A-BTEXT) & 0xffff
R_CKCORE_DOFFSET_LO16
42
disp16
(S+A-BTEXT) & 0xffff
R_CKCORE_PCREL_IMM18BY2
43
disp16
((S+A–P)>>1)&0x3ffff
R_CKCORE_DOFFSET_IMM18ABS
44
word_disp18
(S+A-BDATA)&0x3ffff
R_CKCORE_DOFFSET_IMM18BY2ABS 45
word_disp18
((S+A-BDATA)>>1)&0x3ffff
R_CKCORE_DOFFSET_IMM18BY4ABS 46
word_disp18
((S+A-BDATA)>>2)&0x3ffff
R_CKCORE_GOTOFF_IMM18
47
disp18
?
R_CKCORE_GOT_IMM18BY4
48
word_disp18
(G >> 2)
R_CKCORE_PLT_IMM18BY4
49
word_disp 18
(G >> 2)
R_CKCORE_PCREL_IMM7BY4
50
disp7
((S+A-P) >>2) & 0x7f

4.4.2.1 Static Relocations in Data Sections
R_CKCORE_ADDR32
In DATA sections, absolute 32-bit relocation adds the relocated symbols value to the existing
content of the location specified. Consider the example
.data
D1:
.long 0x10
D2:
.long SYMBOL+ 1234 # <- R_CKCORE_ADDR32 for this word32 field.

The object file emitted by the compiler has a relocation entry for SYMBOL that references the
address of this word. The existing content of the 32 bits at the specified address are overwritten
with the new value.
So in the example, the offset of the relocation is 4, symbol value is SYMBOL in.data section or
other section, addend is 1234.
4.4.2.2 Static Relocations in Text Sections
R_CKCORE_ADDR32

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

33

I_SET
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 32-bit
V2.0 16-bit

Chapter 4. ELF file format

In TEXT sections, absolute 32-bit relocation adds the relocated symbols value to the existing
content of the location specified. Consider the example.
Code example for R_CKCORE_ADDR32 in text
.text
...
jmpi
symbol+1234
...
jsri
printf

# <- R_CKCORE_ADDR32 for this word32 field.

# <- R_CKCORE_ADDR32 for this word32 field.

The object file emitted by the compiler has a relocation entry for symbol that references the
address of this word. The existing content of the 32 bits at the specified address are overwritten
with the new value.
So for the second relocation entry in the example, the offset is the [jsri located PC- .text base
address], symbol value is printf, addend is 0.
4.4.2.3 Static C-SKY V1 Relocation in Text Sections
R_CKCORE_PCRELIMM8BY4
Occur when jmpi/jsri/lrw instructions reference a target that is in a symbol which is identified
in a new section. For examble: (jsri has the same case)
Code example for R_CKCORE_PCRELIMM8BY4
.text
mycode:
...
lrw r1, [myconst]
...
.data
myconst:
.long
0x12345678

It is a obsoleted relocation type.
R_CKCORE_PCRELIMM11BY2
Occur when br, bf, bt, and bsr instructions (typically bsr) reference a target that is not in the
current object file. They can also occur when the target is in a separate section of the same
object file, but these occurrences must be resolved by the compiler/assembler and not appear as
relocation entries.
Code example for R_CKCORE_PCRELIMM11BY2
.import __exit
.export tbsr
.text
tbsr:
bsr __exit

The relocation is calculated as shown in Table 4-8 Relocation Type Encodings. The existing
contents of the low-order 11 bits of the instruction are overwritten with the newly calculated
displacement.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

34

Chapter 4. ELF file format

NOTE
The bsr instruction encoding is the distance from PC+2 to the target. This adjustment
must be made in the compiler/assembler. The emitted relocation record for a bsr to
symbol X must be to X+(–2); in other words, the symbol must be X and the addend
field of the relocation record must contain –2.
R_CKCORE_PCRELIMM4BY2
It is a obsoleted relocation type. This relocation come from MCORE “loopt” instruction, and
C-SKY V2 CPU has no any “loopt”, so this relocation should not appear in any C-SKY V2 CPU
binary files
R_CKCORE_PCREL32
This relocation type computes the difference between a symbol’s value and the address or section
offset to be relocated. It is a obsoleted relocation type for C-SKY.
R_CKCORE_PCRELJSR_IMM11BY2
Like PCRELIMM11BY2, this relocation indicates that there is a ‘jsri’ at the specified address.
There is a separate relocation entry for the literal pool entry that it references (So there are 2
relocation entry for “jsri” when assemble with –jsri2bsr option), but we might be able to change
the jsri to a bsr if the target turns out to be close enough [even though we won’t reclaim the
literal pool entry, we’ll get some runtime efficiency back]. Note that this is a relocation that we
are allowed to safely ignore.
4.4.2.4 Static C-SKY V2.0 Relocation in Text Sections
R_CKCORE_PCREL_IMM26BY2
Occur when br, bsr 32-bit instructions (typically bsr) reference a target that is not in the current
object file. They can also occur when the target is in a separate section of the same object file,
but these occurrences must be resolved by the compiler or assembler and not appear as relocation
entries.
Code example for R_CKCORE_PCREL_IMM26BY2
.import __exit
.export tbsr
.text
tbsr:
bsr __exit

The relocation is calculated as shown in Table 4-8 Relocation Type Encodings. The existing
contents of the low-order 26 bits of the instruction are overwritten with the newly calculated
displacement.
NOTE
The bsr instruction encoding is the distance from PC+2 to the target. This adjustment
must be made in the compiler/assembler. The emitted relocation record for a bsr to
symbol X must be to X+(–2); in other words, the symbol must be X and the addend
field of the relocation record must contain –2.
R_CKCORE_PCRELJSR_ IMM26BY2
Like R_CKCORE_PCREL_IMM26BY2 , this relocation indicates that there is a ‘jsri’ at the
specified address. There is a separate relocation entry for the literal pool entry that it references
(So there are 2 relocation entry for “jsri” when assemble with –jsri2bsr option), but we might

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

35

Chapter 4. ELF file format

be able to change the jsri to a bsr if the target turns out to be close enough [even though we
won’t reclaim the literal pool entry, we’ll get some runtime efficiency back]. Note that this is a
relocation that we are allowed to safely ignore.
R_CKCORE_PRREL_IMM16BY2
Occur when be, bne, bf, bt, bez, bnez, bhz, bhsz, blsz 32-bit instructions reference a target that
is not in the current object file. They can also occur when the target is in a separate section of
the same object file, but these occurrences must be resolved by the compiler or assembler and
not appear as relocation entries.
.import __exit
.export tbsr
.text
tbsr:
bt __exit

The relocation is calculated as shown in Table 4-8 Relocation Type Encodings. The existing
contents of the low-order 16 bits of the instruction are overwritten with the newly calculated
displacement.
NOTE
The bsr instruction encoding is the distance from PC+2 to the target. This adjustment
must be made in the compiler/assembler. The emitted relocation record for a bsr to
symbol X must be to X+(–2); in other words, the symbol must be X and the addend
field of the relocation record must contain –2.
R_CKCORE_PRREL_IMM16BY4
Occur when jmpi,jsri 32-bit instructions reference a target that is in a symbol which is identified
in a new section or in other object file. For examble: (jsri has the same case)
.text
mycode:
...
jsri [myconst]
...
.data
myconst:
.long
0x12345678

R_CKCORE_PRREL_IMM10BY2
Occur when br, bsr, bf, bt 16-bit instructions reference a target that is not in the current object
file. They can also occur when the target is in a separate section of the same object file, but these
occurrences must be resolved by the compiler or assembler and not appear as relocation entries.
.import __exit
.export tbsr
.text
tbsr:
bt __exit

The relocation is calculated as shown in Table 4-8 Relocation Type Encodings. The existing
contents of the low-order 10 bits of the instruction are overwritten with the newly calculated
displacement.
NOTE

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

36

Chapter 4. ELF file format

The bsr instruction encoding is the distance from PC+2 to the target. This adjustment
must be made in the compiler/assembler. The emitted relocation record for a bsr to
symbol X must be to X+(–2); in other words, the symbol must be X and the addend
field of the relocation record must contain –2.
R_CKCORE_PRREL_IMM10BY4
Occur when jsri 16-bit instructions reference a target that is in a symbol which is identified in a
new section or in other object file. For examble: (jsri has the same case)
.text
mycode:
...
jsri [myconst]
...
.data
myconst:
.long
0x12345678

R_CKCORE_ADDR_HI16
In C-SKY V2.0 instruction set, there are two instructions movih and ori to move a 32-bit absolute
address into a register, see Figure 4-10 Code example for R_CKCORE_ADDR_HI16. This
relocation type is used to calculate the lower 16-bit in movih instruction.
.text
...
movih rz, (symbol+1234) >> 16
ori
rz, (symbol+1234) & 0xffff
...

R_CKCORE_ADDR_LO16
In C-SKY V2.0 instruction set, there are two instructions movih and ori to move a 32-bit absolute
address into a register, see Figure 4-10 Code example for R_CKCORE_ADDR_HI16. This
relocation type is used to calculate the lower 16-bit in ori instruction.
R_CKCORE_PCREL_IMM18BY4
Occur when grs 32-bit instructions reference a function symbol that is in the text section. They
can occur when the symbol is in the same or different object file, but these occurrences must
be resolved by the compiler, assembler and linker, but not appear as relocation entries in the
executable elf file.
.import __exit
.export tbsr
.text
tbsr:
grs r10, __exit

The relocation is calculated as shown in Table 4-8 Relocation Type Encodings. The existing
contents of the low-order 18 bits of the instruction are overwritten with the newly calculated
displacement.
R_CKCORE_DOFFSET_IMM18
Occur when lrs.b/srs.b/addi 32-bit instructions load/store the value of a symbol that is in the
data section with DATA section base address register rdb. They can occur when the symbol is

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

37

Chapter 4. ELF file format

in data section of the same or different object file, These occurrences must be resolved by the
compiler, assembler and linker, but not appear as relocation entries in the executable elf file.
.byte
myData
.export tlrsb
.text
tlrsb:
lrs.b r10, myData

The relocation is calculated as shown in Table 4-8 Relocation Type Encodings. The existing
contents of the low-order 18 bits of the instruction are overwritten with the newly calculated
displacement.
R_CKCORE_DOFFSET_IMM18BY2
Occur when lrs.h/srs.h 32-bit instructions load/store the value of a symbol that is in the data
section with DATA section base address register rdb. They can occur when the symbol is in data
section of the same or different object file, These occurrences must be resolved by the compiler,
assembler and linker, but not appear as relocation entries in the executable elf file.
.short myData
.export tlrsh
.text
tlrsh:
lrs.w r10, myData

The relocation is calculated as shown in Table 4-8 Relocation Type Encodings. The existing
contents of the low-order 18 bits of the instruction are overwritten with the newly calculated
displacement.
R_CKCORE_DOFFSET_IMM18BY4
Occur when lrs.w/srs.w 32-bit instructions load/store the value of a symbol that is in the data
section with DATA section base address register rdb. They can occur when the symbol is in data
section of the same or different object file, These occurrences must be resolved by the compiler,
assembler and linker, but not appear as relocation entries in the executable elf file.
.long
myData
.export tlrsw
.text
tlrsw:
lrs.w r10, myData

The relocation is calculated as shown in Table 4-8 Relocation Type Encodings. The existing
contents of the low-order 18 bits of the instruction are overwritten with the newly calculated
displacement.
4.4.2.5 Dynamic Relocations
R_CKCORE_RELATIVE
The linker editor creates this relocation type for dynamic linking. Its offset member gives a
location within a shared object that contains a value representing a relative address. The dynamic
linker computes the corresponding virtual address by adding the virtual address at which the
shared object was loaded to the relative address. Relocation entries for this type must specify 0
for the symbol table index.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

38

Chapter 4. ELF file format

R_CKCORE_COPY
R_CKCORE_COPY may only appear in executable objects where e_type is set to ET_EXEC.
The effect is to cause the dynamic linker to locate the target symbol in a shared library object
and then to copy the number of bytes specified by the st_size field to the place. The address of
the place is then used to pre-empt all other references to the specified symbol. It is an error if the
storage space allocated in the executable is insufficient to hold the full copy of the symbol. If the
object being copied contains dynamic relocations then the effect must be as if those relocations
were performed before the copy was made. Note
R_CKCORE_COPY is normally only used in SVr4 type environments where the executable
is not position independent and references by the code and read-only data sections cannot be
relocated dynamically to refer to an object that is defined in a shared library. The need for copy
relocations can be avoided if a compiler generates all code references to such objects indirectly
through a dynamically relocatable location, and if all static data references are placed in relocatable regions of the image. In practice, however, this is difficult to achieve without source-code
annotation; a better approach is to avoid defining static global data in shared libraries.
R_CKCORE_GLOB_DAT
This relocation type is used to set a global offset table entry to the address of the specified symbol.
The special relocation type allows one to deterimine the correspondence between symbols and
global offset table entries.
R_CKCORE_JMP_SLOT
The link editor creates this relocation type for dynamic linking. Its offset member gives the
location of a GOT entry. The dynamic linker modifies the procedure linkage table entry to
transfer control to the designated symbol’s address, see “Procedure Linkage Table”.
R_CKCORE_GOTOFF
In C-SKY V1.0, when referring to a local DATA or FUNCTION in text section, the compiler
and assembler create the code such as:
lrw rx, SYMBOL@GOTOFF
add rx, gb

and set a R_CKCORE_GOTOFF relocation for the linker; According this relocation type, the
linker computes the difference between a local symbol’s value and the address of the global offset
table. It additionally instructs the link editor to build the global offset table.
R_CKCORE_GOTPC
At the prologue of FUNCTION, the compiler create the code such as:
bsr .L1
.L1:
lrw rx, .L1@GOTPC
add rx, r15

The assembler set a R_CKCORE_GOTPC, According the relocation type, the link editor computes GOT-PC.
R_CKCORE_GOT32
In C-SKY V1.0, when referring to a global DATA or FUNCTION in text section, the compiler
and assembler create the code such as:

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

39

Chapter 4. ELF file format

lrw rx, SYMBOL@GOT
add rx, gb
ld ry,(rx, 0)

and set a R_CKCORE_GOT32 relocation for the linker; The linker create an entry in GOT,
computes the index in GOT for the called function symbol of which the value is stored in GOT,
set R_CKCORE_GLOB_DAT for dynamic linkage.
R_CKCORE_PLT32
In C-SKY V1.0, when calling a global FUNC in text section, the compiler and assembler create
the code such as:
lrw rx, FUNC@PLT
add rx, gb
ld ry,(rx, 0)
jsr ry

and set R_CKCORE_PLT32 relocation for the linker. The linker create an entry in GOT and
an entry in PLT, computes the index in GOT for the called function symbol of which the value
is stored in GOT, set R_CKCORE_JMP_SLOT relocation for dynamic linkage.
R_CKCORE_GOTOFF_HI16 & R_CKCORE_GOTOFF_LO16
In C-SKY V2.0, when referring to a local DATA or FUNCTION in text section, the compiler
and assembler create the code such as:
movih rx, SYMBOL@GOTOFF_HI16
ori rx, SYMBOL@GOTOFF_LO16
add rx, gb

and set a R_CKCORE_GOTOFF_HI16 & R_CKCORE_GOTOFF_LO16 relocation for the
linker; According this relocation type, the linker computes the difference between a local symbol’s
value and the address of the global offset table. It additionally instructs the link editor to build
the global offset table.
R_CKCORE_GOTPC_HI16 & R_CKCORE_GOTPC_LO16
In C-SKY V2.0, at the prologue of FUNCTION, the compiler create the code such as:
bsr .L1
.L1:
movih rx, .L1@GOTPC_HI16
ori rx, .L1@ GOTPC_LO16
add rx, r15

The assembler set a R_CKCORE_GOTPC_HI16 & R_CKCORE_GOTPC_HI16, According
these relocation types, the link editor computes GOT-PC.
R_CKCORE_GOT12
In C-SKY V2.0 instruction set, there is instructions ld/st which use 12 disp to the base address
register. When referring to a global DATA or FUNCTION in text section, the compiler and
assembler create the code such as:
ld rx, (gb, SYMBOL@GOT)

set a R_CKCORE_GOT12 relocation for the linker; The linker creates an entry in GOT,
changes the 12-bit fields in the 32-bit instruction with the entry index in GOT, and set
R_CKCORE_GLOB_DAT for dynamic linkage.
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

40

Chapter 4. ELF file format

R_CKCORE_GOT_HI16 & R_CKCORE_GOT_LO16
In C-SKY V2.0 instruction set, there is instructions ld/st which use 12 disp to the base address
register. When referring to a global DATA or FUNCTION in text section, the compiler and
assembler create the code such as:
movih rx, FUNC@GOT_HI16
ori rx, FUNC@GOT_LO16
ldr.w rx, (gb, rx << 0)

set a R_CKCORE_GOT_HI16 & R_CKCORE_GOT_LO16 relocation for the linker; The
linker creates an entry in GOT, changes the immediate fields in the 32-bit movih/ori instructions
with the entry offset in GOT, and set R_CKCORE_GLOB_DAT for dynamic linkage
R_CKCORE_ADDRGOT
In C-SKY V1.0, when referring to a global DATA or FUNCTION in text section of the executable
program, the compiler and assembler create the code such as:
lrw rx, SYMBOL@ADDRGOT
ld
rx, (rx,0)

set R_CKCORE_ADDRGOT relocation for the linker. The linker create an entry in GOT,
computes the GOT entry address for the called function symbol of which the value is stored in
GOT, set R_CKCORE_GLOB_DAT relocation for dynamic linkage.
R_CKCORE_ADDRGOT_HI16 & R_CKCORE_ADDRGOT_LO16
In C-SKY V2.0, when referring to a global DATA or FUNCTION in text section of the executable
program, the compiler and assembler create the code such as:
movih rx, FUNC@ADDRGOT_HI16
ori rx, FUNC@ADDRGOT_LO16
ldw rx, (rx, 0)

set a R_CKCORE_ADDRGOT_HI16 & R_CKCORE_ADDRGOT_LO16 relocation for the
linker; The linker create an entry in GOT, computes the GOT entry address for the called
function symbol of which the value is stored in GOT, set R_CKCORE_GLOB_DAT relocation
for dynamic linkage, and changes the immediate fields in the 32-bit movih/ori instructions with
the entry address.
R_CKCORE_PLT12
In C-SKY V2.0 instruction set, there is instructions ld/st which use 12 disp to the base address
register. When calling a global FUNC in text section, the compiler
and assembler create the code such as:
ld rx, (gb, FUNC@PLT)
bsr rx

and set R_CKCORE_PLT12 relocation for the linker. The linker create an entry in GOT and
an entry in PLT, computes the index in GOT for the called function symbol of which the value
is stored in GOT, set R_CKCORE_JMP_SLOT relocation for dynamic linkage.
R_CKCORE_PLT_HI16 & R_CKCORE_PLT_LO16
In C-SKY V2.0 instruction set, there is instructions ld/st which use 12 disp to the base address
register. When calling a global FUNC in text section, the compiler and assembler create the code
such as:
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

41

Chapter 4. ELF file format

movih rx, FUNC@PLT_HI16
ori rx, FUNC@PLT_LO16
ldr.w rx, (gb, rx<<2)
jsr
rx

and set R_CKCORE_PLT_HI16, R_CKCORE_PLT_LO16 relocation for the linker. The
linker create an entry in GOT and an entry in PLT, computes the index in GOT for the called
function symbol of which the value is stored in GOT , set R_CKCORE_JMP_SLOT relocation
for dynamic linkage, and changes the immediate fields in the 32-bit movih/ori instructions with
the index in GOT.
R_CKCORE_ADDRPLT
In C-SKY V1.0, when calling a global FUNC in text section of the executableprogram, the
compiler and assembler create the code such as:
lrw rx, FUNC@ADDRPLT
ld ry,(rx, 0)
jsr ry

set R_CKCORE_ADDRPLT relocation for the linker. The linker create an entry in GOT and
an entry in PLT, computes the GOT entry address for the called function symbol of which the
value is stored in GOT, set R_CKCORE_JMP_SLOT relocation for dynamic linkage, and and
changes the immediate fields of the 16-bit lrw instructions with the GOT entry address.
R_CKCORE_ADDRPLT_HI16 & R_CKCORE_ADDRPLT_LO16
In C-SKY V2.0, when calling a global FUNC in text section of the executable program, the
compiler and assembler create the code such as:
movih rx, FUNC@ADDRPLT_HI16
ori rx, FUNC@ADDRPLT_LO16
ld ry,(rx, 0)
jsr ry

set R_CKCORE_ADDRPLT_HI16 & R_CKCORE_ADDRPLT_LO16 relocation for the
linker. The linker create an entry in GOT and an entry in PLT, computes the GOT
entry address for the called function symbol of which the value is stored in GOT, set
R_CKCORE_JMP_SLOT relocation for dynamic linkage, and changes the immediate fields
in the 32-bit movih/ori instructions with the entry address.
Table 4.9 describes the function of relocation types for PIC, and when they are deal with.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

42

Chapter 4. ELF file format

Fields
Text

For What
Loading GOT Base
Address
Refer to Local Data
or Function
Refer to Global Data
or Function

Call local function directly
Call global function directly

Data

Refer to local data or
function
Refer to Global Data or
function

Table 4.9: Relocation Types for PIC
Type in Object File(.o)
R_CKCORE_GOTPC
R_CKCORE_GOTPC_HI16
R_CKCORE_GOTPC_LO16
R_CKCORE_GOTOOFF
R_CKCORE_GOTOFF_HI16
R_CKCORE_GOTOFF_LO16
R_CKCORE_GOT32
R_CKCORE_GOT12
R_CKCORE_GOT_HI16
R_CKCORE_GOT_LO16
R_CKCORE_ADDRGOT
R_CKCORE_ADDRGOT_HI16
R_CKCORE_ADDRGOT_LO16
R_CKCORE_GOTOOFF
R_CKCORE_GOTOFF_HI16
R_CKCORE_GOTOFF_LO16
R_CKCORE_PLT32
R_CKCORE_PLT12
R_CKCORE_PLT_HI16
R_CKCORE_PLT_LO16
R_CKCORE_ADDRPLT
R_CKCORE_ADDRPLT_HI16
R_CKCORE_ADDRPLT_LO16
R_CKCORE_ADDR32
w/section
R_CKCORE_ADDR32 w/sym

Type in .so
NULL

NULL

R_CKCORE_GLOB_DAT

NULL

R_CKCORE_JMP_SLOT

R_CKCORE_RELATIVE
R_CKCORE_ADDR32 w/sym

4.5 Program Loading
As the system creates or augments a process image, it logically copies a file segment to a virtual memory
segment. When and if the system physically reads the file depends on the program’s execution behavior,
system load, etc. A process does not require a physical page unless it references a logical page during
execution. Processes commonly leave many pages unreferenced; therefore delaying physical reads frequently
obviates them, improving system performance. To obtain this efficiency in practice, executable and shared
object files must have segment images whose virtual addresses are zero, modulo the file system block size.
Virtual addresses and file offsets for C-SKY V2 CPU segments are congruent modulo 64 KByte (0x10000) or
larger powers of 2. Because 64 KBytes is the maximum page size, the files are suitable for paging regardless
of physical page size.
Because the page size can be larger than the alignment restriction offset, up to four file pages can hold
impure text or data (depending on page size and file system block size).
• The first text page contains the ELF header, the program header table, and other information.
• The last text page can hold a copy of the beginning of data.
• The first data page can have a copy of the end of text.
• The last data page can contain file information note relevant to the running process.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

43

Chapter 4. ELF file format

Figure 4.1: Executable File Example

Figure 4.2: Program Header Segments

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

44

Chapter 4. ELF file format

Logically, the system enforces the memory permissions as if each segment were complete and separate;
segment addresses are adjusted to ensure each logical page in the address space has a single set of permissions.
In the example in Figure 4-15 Executable File example, the file region holding the end of text and the
beginning of data is mapped twice: once at one virtual address for text and once at a different virtual
address for data.
The end of the data segment requires special handling for uninitialized data which the system defines to begin
with zero values. Thus if the last data page of a file includes information not in the logical memory page, the
extraneous data must be set to zero, rather than the unknown contents of the executable file. ‘‘Impurities’’
in the other three pages are not logically part of the process image; whether the system expunges them is
unspecified.
One aspect of segment loading differs between executable files and shared objects. Executable file segments
typically contain absolute code [see “PIC Examples“]. To let the process execute correctly, the segments
must reside at the virtual addresses used to build the executable file, with the system using the p_vaddr
values unchanged as virtual addresses. Shared object segments typically contain position-independent code,
allowing a segment virtual address to change from one process to another without invalidating execution
behavior. Though the system chooses virtual addresses for individual processes, it maintains the relative
positions of the segments. Because position independent code uses relative addressing between segments,
the difference between virtual addresses in memory must match the difference between virtual addresses in
the file. The following table shows possible shared object virtual address assignments for several processes,
illustrating constant relative positioning. The table also illustrates the base address computations.

Figure 4.3: Shared Object Segment Address Example

4.6 Dynamic Linking
When the system creates a process image, the executable file portion of the process has fixed addresses,
and the system chooses shared object library virtual addresses to avoid conflicts with other segments in the
process. To maximize text sharing, shared objects conventionally use position-independent code, in which
instructions contain no absolute addresses. Shared object text segments can be loaded at various virtual
addresses without changing the segment images. Thus multiple processes can share a single shared object
text segment, even though the segment resides at a different virtual address in each process.
Position-independent code relies on two techniques:
• Control transfer instructions hold addresses relative to the program counter (PC). A PC-relative branch
or function call computes its destination address in terms of the current program counter, not relative
to any absolute address. If the target location exceeds the allowable offset for PC relative addressing,
the program requires an absolute address.
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

45

Chapter 4. ELF file format

• When the program requires an absolute address, it computes the desired value. Instead of embedding
absolute addresses in the the instructions, the compiler generates code to calculate an absolute address
during execution.
Because the processor architecture provides PC relative call, register call and branch instructions, compilers
can easily satisfy the first condition.
A global offset table provides information for address calculation. Position-independent object files (executable and shared object files) have a table in their data segment that holds addresses. When the system
creates the memory image for an object file, the table entries are relocated to reflect the absolute virtual
addresses assigned for an individual process. Because data segments are private for each process, the table
entries can change - whereas text segments do not change because multiple processes share them.
In C-SKY V1.0, because the 4-bit offset field of load and store instructions, the global offset table is limited
to 16 entries (64 bytes), that means 4-bit offset field of load and store can not be used here, instead, we must
use load #offset with “lrw rx, #offset” instruction into rx, add gb to rx, then load the value of the entry in
GOT with “ldw rz, (rx, 0)”, see Figure 4-26 Load & Store for PIC. Oh, my god!, so we have 1G entries (4G
bytes) in GOT now.
In C-SKY V2.0, due to the 12-bit offset field of ldw and stw instructions, we use ldw instruction to load the
value of one GOT entry, so the global offset table is limited to 4096 entries (4096 words).

4.6.1 Dynamic Section
Dynamic section entries give information to the dynamic linker. Some of this information is processor-specific,
including the interpretation of some entries in the dynamic structure.
DT_PLTGOT
On the C-SKY V2 CPU architecture, this entry’s d_ptr member gives the address of the first
entry in the global offset table. As mentioned below, the first three global offset table entries are
reserved, and two are used to hold procedure linkage table information.

4.6.2 Global Offset Table
Position-independent code cannot, in general, contain absolute virtual addresses. Global offset tables hold
absolute addresses in private data, thus making the addresses available without compromising the positionindependence and sharability of a program’s text. A program references its global offset table using positionindependent addressing and extracts absolute values, thus redirecting position-independent references to
absolute locations.
Initially, the global offset table holds information as required by its relocation entries. After the system
creates memory segments for a loadable object file, the dynamic linker processes the relocation entries,
some of which will be type R_CKCORE_GLOB_DAT referring to the global offset table. The dynamic
linker determines the associated symbol values, calculates their absolute addresses, and sets the appropriate
memory table entries to the proper values. Although the absolute addresses are unknown when the link
editor builds an object file, the dynamic linker knows the addresses of all memory segments and can thus
calculate the absolute addresses of the symbols contained therein.
If a program requires direct access to the absolute address of a symbol, that symbol will have a global offset
table entry. Because the executable file and shared objects have separate global offset tables, a symbol’s
address may appear in several tables. The dynamic linker processes all the global offset table relocations
before giving control to any code in the process image, thus ensuring the absolute addresses are available
during execution.
The first entry (entry 0) in the table is reserved to hold the address of the dynamic structure, referenced
with the symbol _DYNAMIC. This allows a program, such as the dynamic linker, to find its own dynamic
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

46

Chapter 4. ELF file format

structure without having yet processed its relocation entries. This is especially important for the dynamic
linker, because it must initialize itself without relying on other programs to relocate its memory image. On
the C-SKY V2 CPU architecture, the second and third entries the global offset table also are reserved. The
second entry (entry 1) is reserved for the ID of this module in the dynamic linker, and the third entry (entry
2) is reserved for a function address in the dynamic linker(dl_linux_reslove), which is used in PLT. See “
Procedure Linkage Table “.
The system may choose different memory segment addresses for the same shared object in different programs;
it may even choose different library addresses for different executions of the same program. Nonetheless,
memory segments do not change addresses once the process image is established. As long as a process exists,
its memory segments reside at fixed virtual addresses.
A global offset table’s format and interpretation are processor-specific. For the C-SKY V2 CPU architecture,
the symbol _GLOBAL_OFFSET_TABLE_ may be used to access the table.
extern Elf32_Addr _GLOBAL_OFFSET_TABLE_[];

The symbol _GLOBAL_OFFSET_TABLE_ must be the base of the .got section, allowing non-negative
“subscripts’’ into the array of addresses.

4.6.3 Function Address
References to the address of a function from an executable file and the shared objects associated with it must
resolve to the same value. References from within shared objects will normally be resolved by the dynamic
linker to the virtual address of the function itself. References from within the executable file to a function
defined in a shared object will normally be resolved to the real address of the function within the executable
file.

4.6.4 Procedure Linkage Table
Much as the global offset table redirects position-independent address calculations to absolute locations, the
procedure linkage table redirects position-independent function calls to absolute locations. The link editor
cannot resolve execution transfers (such as function calls) from one executable or shared object to another.
Consequently, the link editor arranges to have the program transfer control to entries in the procedure
linkage table. On the C-SKY V2 CPU architecture, procedure linkage tables reside in shared text, but they
use addresses in the private global offset table. The dynamic linker determines the destinations’ absolute
addresses and modifies the global offset table’s memory image accordingly. The dynamic linker thus can
redirect the entries without compromising the position-independence and sharability of the program’s text.
Following the steps below, the dynamic linker and the program “cooperate’’ to resolve symbolic references
through the procedure linkage table and the global offset table.
1. When first creating the memory image of the program, the dynamic linker sets the second and the
third entries in the global offset table to special values. Steps below explain more about these values.
2. If the procedure linkage table is position-independent, the address of the global offset table must reside
in gb. Each shared object file in the process image has its own procedure linkage table, and control
transfers to a procedure linkage table entry only from within the same object file.
Consequently, the calling function is responsible for setting the global offset table base register before
calling the procedure linkage table entry. So the compiler must create codes to calculate the global
offset table base, and set it in gb (GOT base register) at the prologue of the calling function, Just like:

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

47

Chapter 4. ELF file format

Func:
...
/* Save registers, such as gb, r15, and others */
bsr L1
/* r15 = L1 = PC+2 now */
L1:
/* R_CKCORE_GOTPCHI16 & ~_GOTPCLO16 in C-SKY V2.0*/
/* R_CKCORE_GOTPC in C-SKY V1.0 */
/* GOTPC is a flag for assembler */
lrw gb , L1@GOTPC
/* lrw is a pseudo instruction in C-SKY V2.0 */
add gb , r15
/* so gb = $GOT */
...
/* alloc stack space for local variables */

3. For illustration, assume the program calls name1, then the compiler creates the function calling, such
as:
Func:
...
/* Calling name1 function created by compiler, r13 can be other registers */
/* name1@GOT is a flag for assembler */
lrw r13, name1@GOT
/* r13 = index * 4 = name1@GOT -$GOT */
add r13, gb
ld r13, (r13, 0)
/* r13 = *(name1@GOT) */
jsr r13
Func:
...
/* Calling name1 function created by compiler, r13 can be other registers */
/* name1@GOT is a flag for assembler */
ld r13, (gb, name1@GOT) /* r13 = *(name1@GOT), offset < 4096 */
jsr r13

4. Initially (first time to calling name1), If the dynamic linker is using lazy binding technique,
(name1@GOT) in the global offset table holds the address of the instructions in PLT, not the real
address of name1. So calling name1 ( jsr r13 instruction ) transfers control to the label .PLT1.
If the lazy binding technique is not used in dynamic linker, or the second time to calling name1 when
lazy binding, the global offset table holds the real address of name1, the dynamic linking is finished.
So if binding directly in the dynamic linker, we need not PLT.
5. For lazy binding, in PLT, each entry includes some instructions, just like Figure 4-21 Codes in PLT
Entry in C-SKY V1.0 and Figure 4-22 Codes in PLT Entry in C-SKY V2.0:
.PLT1:
/* for calling name1 */
subi r0, 32
/* to save arguments in stack for name1 */
stw r2, (r0, 0)
stw r3, (r0, 4)
/* load the function address in the dynamic linker */
ldw r2, ( gb , 8)
/* Prepare the arguments in r2&r3 for the dynamic linker */
lrw r3, #offset
/* the offset of relocation for name1 in .reloc */
/* we need not load the ID of this module in the dynamic linker */
/* ID can be gotten with gb(GOT base address) */
jmp r2
/* transfer the control to the dynamic linker*/
.PLT2:
...

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

48

Chapter 4. ELF file format

.PLT1:
/* for calling name1 */
/* load the function address in the dynamic linker */
ldw t0, ( gb , 8)
/* Prepare the arguments in r2&r3 for the dynamic linker */
lrw t1, #offset
/* the offset of relocation for name1 in .reloc */
/* we need not load the ID of this module in the dynamic linker */
/* ID can be gotten with gb(GOT base address) */
jmp t0
/* transfer the control to the dynamic linker*/
.PLT2:
...

6. At first, we must save all arguments of name1 on the stack, but does not save link register (r15). So
the dynamic linker need not save r2~r7 any more. But must save r8 ~r15 if they are used in dynamic
linker.
7. Secondly, the program load the relocation offset (offset) in .dynamic section to r2. The relocation offset
is a 32-bit, non-negative byte offset into the relocation table. The designated relocation entry will have
type R_CKCORE_JMP_SLOT, and its offset will specify the global offset table entry used in step 3.
The relocation entry also contains a symbol table index, thus telling the dynamic linker what symbol
is being referenced, name1 in this case.
8. After getting the relocation offset, the program places the value of the second global offset table entry
(GOT+ 4)/( gb , 4) into r3, thus giving the dynamic linker one word of identifying information. The
program then jumps to the address in the third global offset table entry (GOT + 8)/( gb , 8), which
transfers control to the dynamic linker.
9. When the dynamic linker receives control, it looks at the designated relocation entry, finds the symbol’s
value, stores the “real’’ address for name1 in its global offset table entry, and transfers control to the
desired destination. For example, the implement of _dl_linux_resolve function in the dynamic linker
of uClibc, see Figure 4-23 _dl_linux_resolve Function in the Dynamic linker in C-SKY V1.0 and
Figure 4-24 _dl_linux_resolve Function in the Dynamic linker in C-SKY V2.0
_dl_linux_resolve:
stw r4, (r0,8)
/* to save arguments in stack for name1 */
stw r5, (r0,12)
stw r6, (r0,16)
stw r7, (r0,20)
stw r15,(r0,24)
ldw r2, (gb,4)
/* load the ID of this module */
bsr _dl_linux_resolver /* r2 = id, r3 = offset(do it in plt*) */
mov r1, r2
/* the address of function is in r2 */
ldw r2, (r0,0)
/* Restore the argument of the called function */
ldw r3, (r0,4)
ldw r4, (r0,8)
ldw r5, (r0,12)
ldw r6, (r0,16)
ldw r7, (r0,20)
ldw r15,(r0,24)
addi r0, 32
/* Restore the r0, because r0 is subtracted in PLT table */
jmp r1
/* call the function without saving pc */
_dl_linux_resolve:
subi sp, 32
stm a0-a6, (sp, 0)
stw lr, (sp, 24)
ldw a0, (gb, 4)

/* to save arguments in stack for name1 */
/* load the ID of this module */
(continues on next page)

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

49

Chapter 4. ELF file format

(continued from previous page)

mov a1, t1
bsr _dl_linux_resolver
mov t0, a0
ldm a0-a6, (sp, 0)
ldw lr, (sp, 24)
addi sp, 32
jmp t0

/*
/*
/*
/*

offset in .relocation */
a0 = id, a1 = offset(do it in plt*) */
the address of function is in a0 */
Restore the argument of the called function */

/* Restore the sp */
/* jump to the function without saving pc */

10. Subsequent instructions at step 3 will call directly to name1, without calling the dynamic linker a
second time. That is, the jsr instruction at step 3 will transfer to name1, instead of transferring to the
.PLT1 instruction.
The LD_BIND_NOW environment variable can change dynamic linking behavior. If its value is non-null,
the dynamic linker evaluates procedure linkage table entries before transferring control to the program.
That is, the dynamic linker processes relocation entries of type R_CKCORE_JMP_SLOT during process
initialization. Otherwise, the dynamic linker evaluates procedure linkage table entries lazily, delaying symbol
resolution and relocation until the first execution of a table entry.

4.7 PIC Examples
This section discusses example code sequences for basic operations such as calling functions, accessing static
objects, and transferring control from one part of a program to another. As before, examples use the ANSI
C language. Other programming languages may use the same conventions displayed below, but failure to do
so does not prevent a program from conforming to the ABI. Two main object code models are available.
Absolute code Instructions can hold absolute addresses under this model. To execute properly,
the program must be loaded at a specific virtual address, making the program absolute addresses
coincide with the process virtual addresses.
Position-independent code Instructions under this model hold relative addresses, not absolute
addresses. Consequently, the code is not tied to a specific load address, allowing it to execute
properly at various positions in virtual memory.
The following sections describe the differences between absolute code and position-independent code. Code
sequences for the models (when different) appear together, allowing easier comparison
Note The examples below show code fragments with various simplifications. They are intended to explain
addressing modes, not to show optimal code sequences or to reproduce compiler output or actual
assembler syntax.

4.7.1 Function proglogue for PIC
This section describes the function prologue for position-independent code. A function prologue first calculates the address of the global offset table, leaving the value in register gb, This calculation is a constant
offset between the text and data segments, known at the time the program is linked.
The offset between the start of a function and the global offset table (known because the global offset table
is kept in the data segment) is added to the virtual address of the function to derive the virtual address of
the global offset table. This value is maintained in the gb register throughout the function.
After calculating the gb, a function allocates the local stack space, the gb is a called saved register. See the
codes in Figure 4-18 Codes to caculate GOT base address

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

50

Chapter 4. ELF file format

4.7.2 Date Objects
This section describes data objects with static storage duration. The discussion excludes stack-resident
objects, because programs always compute their virtual addresses relative to the stack pointer.

Figure 4.4: Absolute Load And Store
Position-independent instructions cannot contain absolute addresses. Instead, instructions that reference
symbols hold the symbols’ offsets into the global offset table. Combining the offset with the global offset
table address in gb gives the absolute address of the table entry holding the desired address .

Figure 4.5: Load And Store For PIC

4.7.3 Function Call
C-SKY V1 CPU Programs use the jump and link instruction, jsri, to make direct function calls, since the jsri
instruction provides 32 bits of address, direct function calls can appoach full address space (0 ~ 4 GByte),

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

51

Chapter 4. ELF file format

but C-SKY V2 CPU use the jump and link instruction, bsr, to make direct function calls, since the bsr
instruction provides 26 bits of address, direct function calls can appoach 256 Mbyte address space.

Figure 4.6: Absolute Direct Function Calling
Other indirect function calls are done by computing the address of the called function into a register and
using the jump and link register, jsr.

Figure 4.7: Absolute Indirect Function Calling
Calling position independent code functions is always done with the jsr instruction. The global offset table
holds the absolute addresses of all position independent functions.

Figure 4.8: PIC Function Calling

4.7.4 Branching
C-SKY V2 CPU programs use branch instructions to control execution flow. As defined by the architecture,
branch instructions hold a PC-relative value with a 2 KByte range, allowing a jump to locations up to 2
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

52

Chapter 4. ELF file format

KBytes away in either direction.

Figure 4.9: Branching
C switch statements provide multiway selection. When case labels of a switch statement satisfy grouping
constraints, the compiler implements the selection with an address table. The address table is placed in a
.rdata section; this so the linker can properly relocate the entries in the address table. Figure 4-31 Absolute
Switch Codes and Figure 4-32 PIC Switch Codes use the following conventions to hide irrelevant details:
• The selection expression resides in register r7(C-SKY V1.0), t0(C-SKY V2.0).
• Case label constants begin at zero.
• Case labels, default, and the address table use assembly names. Lcasei, .Ldef, and .Ltab, respectively.
Address table entries for absolute code contain virtual addresses; the selection code extracts the value of an
entry and jumps to that address. Position-independent table entries hold offsets; the selection code compute
the absolute address of a destination.

Figure 4.10: Absolute Switch Codes

4.8 Debugging Information Format
Currently, CSKY V2 toolchain uses DWARF 2.0 described in System V Application Binary Interface, demised
by Santa Cruz Operation, Inc, as it’s internal implementation of debugging support.
Moreover, we don’t extend the standard DWARF 2.0 format by now. Nevertheless, we would augument it
by adding some extensions to standard DWARF 2.0 format in the future.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

53

Chapter 4. ELF file format

Figure 4.11: PIC Switch Codes

4.8.1 DWARF Register Numbers
DWARF generally describes the steps a debugger takes to locate variables in a pro- gram being debugged
in machine-independent terms. However, the way in which the OP_REG and OP_BASEREG atoms are
handled is machine-specific — these atoms require that a value (or the pointer to a value) be contained in a
machine-specific reg- ister.
Table 4.10 DWARF Register Atom Mapping for C-SKY V1 CPU shows the mapping between the values
used in those atoms and the CKCORE register set. The entries for r0 through r15 specify the currently
active set of general purpose registers; this is usually the primary register set. The entries for r0’ through
r15’ specify the alternate register file. The control registers are encoded from 32 through 63.
Table 4.10: DWARF Register Atom Mapping for C-SKY V1
Atom Register Atom Register Atom Register Atom
0
r0
1
r1
2
r2
3
4
r4
5
r5
6
r6
7
8
r8
9
r9
10
r10
11
12
r12
13
r13
14
r14
15
16
r0’
17
r1’
18
r2’
19
20
r4’
21
r5’
22
r6’
23
24
r8’
25
r9’
26
r10’
27
28
r12’
29
r13’
30
r14’
31
32
cr0
33
cr1
34
cr2
35
36
cr4
37
cr5
38
cr6
39
40
cr8
41
cr9
42
cr10
43
44
cr12
45
cr13
46
cr14
47
48
cr16
49
cr17
50
cr18
51
52
cr20
53
cr21
54
cr22
55
56
cr24
57
cr25
58
cr26
59
60
cr28
61
cr29
62
cr30
63
64
pc

Release 2.1

CPU
Register
r3
r7
r11
r15
r3’
r7’
r11’
r15’
cr3
cr7
cr11
cr15
cr19
cr23
cr27
cr31

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

54

Chapter 4. ELF file format

Table 4.11: DWARF Register Atom Mapping for C-SKY V2
Atom Register Atom Register Atom Register Atom
0
r0
1
r1
2
r2
3
4
r4
5
r5
6
r6
7
8
r8
9
r9
10
r10
11
12
r12
13
r13
14
r14
15
16
r16
17
r17
18
r18
19
20
r20
21
r21
22
r22
23
24
r24
25
r25
26
r26
27
28
r28
29
r29
30
r30
31
32
cr0
33
cr1
34
cr2
35
36
cr4
37
cr5
38
cr6
39
40
cr8
41
cr9
42
cr10
43
44
cr12
45
cr13
46
cr14
47
48
cr16
49
cr17
50
cr18
51
52
cr20
53
cr21
54
cr22
55
56
cr24
57
cr25
58
cr26
59
60
cr28
61
cr29
62
cr30
63
64
pc
65
r0’
66
r1’
67
68
r3’
69
r4’
70
r5’
71
72
r7’
73
r8’
74
r9’
75
76
r11’
77
r12’
78
r13’
79
80
r15’

Release 2.1

CPU
Register
r3
r7
r11
r15
r19
r23
r27
r31
cr3
cr7
cr11
cr15
cr19
cr23
cr27
cr31
r2’
r6’
r10’
r14’

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

55

CHAPTER

5

Runtime library

The most of libraries are dependent on platform and OS. In the view of this, they are beyond the scope
of this document and wouldn’t be addressed here. Some library functions are required to provide support
for operations that are not supported directly by the C-SKY V2 CPU hardware. These library routines are
specified in this section.
This chapter consists of following sections.
• Compiler assisted Libraries
• Floating Point Routines
• Long Long integer Routines

5.1 Compiler assisted Libraries
Currently, the C-SKY V2 CPU doesn’t support those instructions operating on floating point number or
long long data types. Compilers should provide the functionality for some of these operations through the
use of support library routines. The C-SKY V2 CPU Technology Center requires a single shared support
library for all tool sets to eliminate redundant code.
The functions to be provided through support routines include:
1. Floating point math routines
2. Long long routines
Compilers that generate in-line code to provide these functions must make no refer- ences to the library
functions.
Compilers that provide these functions by generating subroutine calls to the support libraries must use the
standard interfaces.
In particular, it is required to link objects produced with different tool sets into single executables as follows.
• Compiler support library names wouldn’t clash between tool sets

56

Chapter 5. Runtime library

• Compiler support routines are comformed with linkage rules
• Linkers from different tool sets must either use the same support library names and interfaces, or
provide a mechanism to indicate where support libraries can be found.
• Routines in the support libraries must satisfy the following constraints.
– The only external state information used is floating point rounding mode
– No global state can be modified
– Identical results must be returned when a routine is re-invoked with the same input arguments
– Multiple calls with the same input arguments can be collapsed into a single call with a cached
result
These properties permit a compiler to make assumptions about variable lifetimes across library subroutine
calls that values in memory won’t change, and previously de-referenced pointers need not be de-referenced
again.

5.2 Floating Point Routines
These routines conform with ABI linkage conventions concerning registers that must be preserved across
function calls. The routines have no side effects. They do not modify memory except as noted, thus allowing
compilers to optimize de-referenced pointer values across calls. The routines always return the same value
for the same inputs, allowing compilers to optimize subsequent calls away.
The data formats are as specified in IEEE 754. The routines are not required to compute results as specified
in IEEE 754. Implementations of these routines must document the degree to which operations conform to
the IEEE standard. Not all users of floating point require IEEE 754 precision and exception handling, and
may not want to incur the overhead that complete conformance requires.

5.2.1 Arithmetic functions
Table 5.1: Floating point arithmetic functions
Functions
Description
double __adddf3(double a, double b) addition of a and b with double precision.
double __subdf3(double a, double b) subtract of a and b with double precision.
double __muldf3(double a, double b) multiple of a and b with double precision.
double __divdf3(double a, double b)
division of a and b with double precision.
double __negdf2(double a)
negative a of type double precision.
float __addsf3(float a, float b)
addition of a and b with single precision.
float __subsf3(float a, float b)
subtract of a and b with single precision.
float __mulsf3(float a, float b)
multiply of a and b with single precision.
float __divsf3(float a, float b)
division of a and b with single precision.
float __negsf2(float a)
negative a of type single precision.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

57

Chapter 5. Runtime library

5.2.2 Conversion functions
Table
Functions
double __extendsfdf2(float a)
float __truncdfsf2(double a)
int __fixsfsi(float a)
int __fixdfsi(double a)
long long __fixsfdi(float a)
long long __fixdfdi(double a)
unsigned int __fixunssfsi (float a)
unsigned int __fixunsdfsi (double
a)
unsigned long long __fixunssfdi
(float a)
unsigned long long __fixunsdfdi
(double a)
float __floatsisf (int i)
double __floatsidf (int i)
float __floatdisf (long i)
double __floatdidf (long i)
float __floatunsisf (unsigned int
i)
double __floatunsidf (unsigned
int i)
float __floatundisf (unsigned
long i)
double __floatundidf (unsigned
long i)

Release 2.1

5.2: Floating point conversion functions
Description
extending single precisio to double.
truncating double precison to single.
convert a to an signed integer, rounding toward zero
convert a to a signed long long, rounding toward zero
convert a to an unsigned integer, rounding toward zero. Negative
values all become zero
convert a to an unsigned long, rounding
toward zero. Negative values all become
convert i, a signed integer, to floating point
convert i, a signed long, to floating point
convert i, an unsigned integer, to floating
point

convert i, an unsigned long, to floating point

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

58

Chapter 5. Runtime library

5.2.3 Comparison functions
Table 5.3: Floating point comparison functions
Functions
Description
int __cmpsf2 (float a, float b)
These functions compare a with b. Return ing -1
when a less b, 0 when a equals b, otherwise return
int __cmpdf2 (double a, double b)
1. Also if eigthr argum ent is NaN returning 1.
int __unordsf2 (float a, float b)
When either a or b is NaN, returning nonz ero value.
Otherwise returning zero. There is also a complete
group of higher level functions which correspond
directly to comparison operators. They implement
the ISO C semantics for floating-point comparisons,
taking NaN into account. Pay careful attention to
the return values defined for each set. Under the
hood, all of these routines are implemented as
int __unorddf2 (double a, double b)

if (__unordXf2 (a, b))
return E;
return __cmpXf2 (a, b);

where E is a constant chosen to give
the proper behavior for NaN. Thus, the
mean ing of the return value is different
for each set. Do not rely on this implementation; only the semantics documented below are guaranteed.
int
int
int
int
int

__eqsf2 (float a, float b)
__eqdf2 (double a, double b)
__nesf2 (float a, float b)
__nedf2 (double a, double b)
__gesf2 (float a, float b)

int __gedf2 (double a, double b)
int __ltsf2 (float a, float b)
int __ltdf2 (double a, double b)
int __lesf2 (float a, float b)
int __ledf2 (double a, double b)
int __gtsf2 (float a, float b)
int __gtdf2 (double a, double b

These functions return zero if neither argument is
NaN, and a and b are equal.
These functions return a nonzero value if either argument is NaN, or if a and b are unequal.
These functions return a value greater than or equal
to zero if neither argument is NaN, and a is greater
than or equal to b.
These functions return a value less than zero if neither argument is NaN, and a is strictly less than
b.
These functions return a value less than or equal to
zero if neither argument is NaN, and a is less than
or equal to b.
These functions return a value greater than zero if
neither argument is NaN, and a is strictly greater
than b.

5.3 Long Long integer Routines
These routines comply with ABI linkage conventions concerning registers that must be preserved across
function calls. The routines have no side effects. They do not modify memory except as noted, and thus
allow compilers to optimize de-referenced pointer values across calls. The routines always return the same
value for the same inputs, allowing compilers to optimize subsequent calls away.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

59

Chapter 5. Runtime library

5.3.1 Arithmetic functions
Table 5.4: long long arithmetic functions
Functions
Description
long long __ashldi3 (long long a, int b)
This function return the result of
shifting a left by b bits
long long __ashrdi3 (long long a, int b)
This function return the result of
arithmetically shifting a right by b
bits
long long __lshrdi3 (long long a, int b)
This function return the result of
logically shifting a right by b bits
long __divsi3 (long a, long b)
These functions return the quotient
of
long long __divdi3 (long long a, long long b)
the signed division of a and b
long __modsi3 (long a, long b)
These functions return the remainder
long long __moddi3 (long long a, long long b)
of the signed division of a and b
long long __muldi3 (long long a, long long b)
This function return the product of
a and b
long long __negdi2 (long long a)
This function return the negation of
a
These functions return the
unsigned long __udivsi3 ( unsigned long a, unsigned long quotient of the unsigned division of
b)
a and b
unsigned long long __udivdi3 (unsigned long long a,
unsigned long long b)
unsigned long long __udivmoddi4 (unsigned long long a,
unsigned long long b, unsigned long long *c)

unsigned long __umodsi3 (unsigned long a, unsigned long
b)

This function calculate both the
quotient and remainder of the unsigned division of a and b. The
return value is the quotient, and
the remainder is placed in variable
pointed to by c
These functions return the remainder of the unsigned division of a and
b

unsigned long long __umoddi3 (unsigned long long a,
unsigned long long b)

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

60

Chapter 5. Runtime library

5.3.2 Comparison functions
Table 5.5: long long comparison functions
Functions
Description
int __cmpdi2 (long long a, long long b)
These function perform a signed comparison of a
and b. If a is less than b, they return 0; if a is
greater than b, they return 2; and if a and b are
equal they return 1
int __ucmpdi2 (unsigned long long a, unsigned
These function perform an unsigned
long long b)
comparison of a and b. If a is less than
b, they return 0; if a is greater than b, they
return 2; and if a and b are equal they return
1

5.3.3 Trapping Arithmetic Functions
Table 5.6: long long trapping arithmetic functions
Functions
Description
int __absvsi2 (int a)
These functions return the absolute value
long __absvdi2 (long a)
of a
int __addvsi3 (int a, int b)
These functions return the sum of a and b; that is a + b.
long __addvdi3 (long a, long b)
int __mulvsi3 (int a, int b)
Those functions return product of a and b;
long __mulvdi3 (long a, long b) that is a*b
int __negvsi2 (int a)
These functions return the negation of a; that is -a
long __negvdi2 (long a)
int __subvsi3 (int a, int b)
These functions return the difference
long __subvdi3 (long a, long b) between b and a; that is a - b
all following functions implement trapping arithmetic. These functions call the libc function abort upon
signed arithmetic overflow.

5.3.4 Bit Operations

Functions
int __ffsdi2(long long a)

Release 2.1

Table 5.7: long long bit operations
Description
These functions return the index of the least significant 1-bit in a, or the
value zero if a is zero. The least significant bit is index one

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

61

CHAPTER

6

Assembly syntax and directives

In this chapter, there are several sub sections would be introduced as follows. If you want to focus on the
specified contents, you can click the corresponding link.
• Section
• Input line lengths
• Syntax
• Assembler directives
• Pseudo-Instructions

6.1 Section
The generated file of assembler consists of several sections whose content is determined by the assembler
input. Section containing code is aligned to 2-byte boundary. Section containing data is aligned so that the
alignment requirements of the data contained in the section is preserved.

6.2 Input line lengths
The assembler may limit input lines, but such a limit must be at least 2100 characters in length. This gives
compiler the ability to construct an expression containing a symbol of maximum supported length (2048
bytes) and a data-allocation pseudo-instruction. For example.
.long longsymbol

The assembler is allowed to support longer lines. If the assembler imposes a limit on the length of an input
line, the assembler must issue a diagnostic if that limit approached.

62

Chapter 6. Assembly syntax and directives

6.3 Syntax
An assembly source file contains a list of one or more assembler statements. Each statement is terminated
with a newline character or a “;” character except that it appears within string literal or comment. Empty
statements (i.e. blank lines) would be ignored.
Each statement consists of zero or more labels, at most one memonic, with the remainder of the statement
being arguments specific to the memonic.
Labels are symbols that are followed by a “:”. Temporary labels are allowed and are indicated by a non-zero
digit (1–9) instead of a symbol. Duplicate temporary labels are allowed and references to them are resolved
by searching for the nearest source line with the label. References to temporary labels must have a “b” or
“f” suffix appended to the digit to indicate which direction to search.
Labels that begin with “.” ( period ) are considered local labels. The assembler does not include these
symbols in the symbol table of the generated object file. Memmonics fall into three categories: instructions,
pseudo-instructions, and directives. Instruction memonics map one-to-one into an C-SKY V2 CPU opcode.
Pseudo-instructions map into sequences of C-SKY V2 CPU opcodes. Directives always start with a “.” and
are used to control the assembly and allocate data areas. All memonics are case sensitive and must be
specified in lower case.
White space in assembly source files is ignored except as a separator between memonics and when embedded
within string literals or character constants. Multiple white space characters are functionally equivalent to
a single white space character except within literals and character constants.
Comment in assembly file is indicated by several styles as follows.
• “//” sequence indicates a comment reaching to the end of the line.
• “#” character, when not part of a valid preprocessing directive, indicates a comment reaching to the
end of the line.
Comments are terminated only by the end of the line. The “;” character does not terminate comment. A
multi-line comment, e.g. “/* */”, is not supported since most assemblers are inherently line oriented.
Comments can never begin or end within a string literal or character constant.

6.3.1 Preprocessing
The assembler is not required to provide macro preprocessing. This functionality can be provided by existing
preprocessors that conform to the ANSI standard. If the assembler does provide preprocessing, then it must
conform to the “C” language preprocessing standard and the following paragraph does not apply. An
assembler command line option will enable the following behavior. Any line with a “#” character in the
first column is assumed to be line and file information from the preprocessor. The assembler must use this
information in error messages. This allows a programmer to relate an error back to the line and file of the
original source file before preprocessing. The file and line information from the preprocessor is in the form:
# number “ filename ”

Any other preprocessor lines that do not match this form are ignored by treating them as comments.

6.3.2 Symbols
Symbols must begin with a character in the set: a–z, A–Z, . (period), or _ (underscore). The remaining
characters in a symbol may be in that set plus the digits 0–9. Symbols are case sensitive and all characters in

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

63

Chapter 6. Assembly syntax and directives

the symbol are significant. Symbols may be limited in length but that limit must be at least 2048 characters.
If there is a limit on symbol length, symbols that exceed the limit must cause an error message to be emitted.
Silent truncation of long symbols is undesirable. This is intended to avoid silent errors where two long
symbols differ only at some point after the tools have stopped keeping track of significant characters. The
“$” character is not allowed in a symbol name because it is not a universally supported character on non-U.S.
keyboards.
The special symbols created by temporary labels can only be referenced within a single source file. These
references must consist of a single digit followed by a “b” or “f” to indicate the direction of the nearest
matching label. The “.” symbol will always indicate the current location within the current section at the
start of the current statement. Thus:
movi r3,15
br
.
br .

results in three instructions, two of which branch to themselves. The “.” symbol is used instead of “*”
because it avoids conflicts with “*” as a multiply operator.

6.3.3 Constants
The same constants and lexical expression of constants that are available in C are allowed in the assembly.
This includes hex, octal, decimal, float, double, character, and strings. Both character and string constants
have characters, ‘ and “ respectively, to delimit them. Multiple characters within character constant are each
treated like a base 256 number. e.g. ‘1234’ equals 0x31323334.
The syntax of constants is chosen to be familiar to C programmers. The use of special characters in the
syntax for constants must be avoided as they are used in expressions. In addition, the “$” character is not
a universally supported character on non-U.S. keyboards.

6.3.4 Expressions
Addition, subtraction, multiplication, division, modulus, logical anding, inclusive oring, exclusive oring,
negating, complementing, and shifting operations are supported by the assembler for the generation of
constants or relocatable expressions in the argument portion of a statement. These operations have the
semantics and precedence of their equivalent C language operations. Parenthesis can be used to force
particular bindings of operations. All operations are done as if on 32-bit unsigned values. The syntax of
expressions is chosen to be familiar to C programmers.
Expressions can involve more than one relocatable value as long as the assembler can resolve the expression
to remove all or all but one of the relocatable values. For example, the difference between two labels in the
same section reduces to an assemble time constant.
Relocatable expressions must evaluate down to a possibly-zero offset from a relocat- able address. The linker
is not required to provide the ability to store the value “5 times the value of this relocatable symbol”.

6.3.5 Oprators and Precedence
Table 6.1 shows the operators available to the assembly programmer. The table is arranged in order of
precedence; the higher precedence operators appear earlier in the table. These are the same operators used
in the C language.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

64

Chapter 6. Assembly syntax and directives

Table 6.1: Assembly Expression Operators
Assembly Expression Operators
Precedence
unary negation
1
~
unary logical complement
*
multiplication
2
/
division
%
modulus
+
addition
3
subtraction
<<
left shift
4
>>
right shift
&
logical and
5
^
logical exclusive or
6
|
logical inclusive or
7
Operations may be grouped with parentheses to force a particular precedence.

6.3.6 Instruction Memonics
The instruction opcode mnemonics are listed in the C-SKY V2 CPU Reference Manual.

6.3.7 Instruction Arguments
Register arguments within the argument portion of a statement are indicated by the character, “r” or “R”
followed by the register number (0 through 15). Register 0 (r0) can also be specified as “sp”.
Instructions that use the PC relative indirect addressing (lrw, jsri, jmpi) take two argu- ment syntaxes. The
first syntax is of the form:
lrw
lrw
lrw
lrw

r0,
r1,
r2,
r3,

0x12345678
0x4321
0x4321
0x4321

he assembler collects these argument values into a literal table, possibly allowing several instructions to
reuse the same slot, and emit them at an appropriate point in the output. Such a point may be after the
nearest unconditional branch. In some situations, such a location might not arise before the span of the
lrw/jsri/jmpi instruction is exhausted. In such cases, the assembler must spill the literal table before the
span is exhausted and provide a branch around the literal table.
The assembler provides a mechanism that allows the user to force a dump of the cur- rently outstanding
literals by using the .literals pseudo-instruction. Any literals that have not yet been emitted are emitted
when this directive is encountered. When the assembler input is exhausted, the assembler emits any literals
that have not yet been emitted, as if a .literals pseudo-instruction was appended to the assembly source.
NOTE
The assembler is allowed, but not required, to attempt to optimize code size by doing “optimal”
literal placement. This interacts with the expansion of jbt and jbf pseudo-operations. Also, if
literals must be output after an instruction that is not an unconditional transfer of control, the
assembler must insure that a branch around the literal table is also generated.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

65

Chapter 6. Assembly syntax and directives

The second form uses a [label] notation for the literal. In this case, the supplied argument is
the label of the address containing the value to be loaded. This gives the assembler programmer
complete control over the placement and sharing of literals.
rw r0,[lit0]
lrw r1,[lit1]
lrw r2,[lit1]
lrw r3,[lit1]
...
.align 4
Lit0: .long 0x12345678
Lit1: .long 0x4321

NOTE
The user is responsible for insuring that the specified label is 4-byte aligned when using the [label]
literal syntax.
The C-SKY V2 CPU instruction set does not directly support position independent code, so it is
up to the assembler programmer or compiler to synthesize PC-relative branches and subroutine
calls. To help support this, a 32-bit PC relative argument type is allowed and is indicated by an
expression that is evaluated as a delta from “.”. Any symbols in the expression must be within
the same section as the instruction so the assembler can resolve it to a constant offset. This can
be done in the following manner (assuming r1 and r15 are available):
bsr .+2
lrw r1,symbol-.
add r1,r15
jsr r1
...
symbol: subi r0,12

6.4 Assembler directives
Assembler directives are used to control the assembly of the source code as well as reserving and/or initializing
areas for data. All assembler directive mnemonics begin with a “.”.
Only the .align, .comm, and .lcomm directives align the location counter to a known boundary. All other
mnemonics, including .long, do not imply alignment. It is up to the assembler programmer or compiler to
explicitly align these locations to avoid runtime misalignment faults. For operations that specify alignment
values (e.g., .align, .comm, and .lcomm), the value specified is log2 of the alignment. For example, the value
“3” specifies 8-byte alignment.
All data values emitted by assembler directives will be in big-endian order. This alignment behavior is
needed to support packed data structures. Packed data structures explicitly allow misaligned fundamental
types to save data space at the expense of additional code to pack and unpack the structures. Note that the
ABI does not specify how a user expresses such misaligned references at the C source level. The directive
syntax in this manual uses “[” and “]” to indicate an optional field. The “{” and “}” syntax indicates zero
or more repetitions of a field.

6.4.1 .align abs-exp [, abs-exp]
ligns the location counter to the boundary indicated by the first constant expression. The integral alignment
argument is log2 of the alignment, e.g. the value “3” specifies 8-byte alignment. Negative alignment values
are treated as zero, indicating 1-byte alignment.
Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

66

Chapter 6. Assembly syntax and directives

The second, optional expression is the value to be filled into the bytes between the old location and new
location. If unspecified, the bytes will be filled with zeros.
NOTE
The maximum alignment allowed is not constrained by the assembler. But in order for the
assembler to be able to resolve expressions between symbols in the section, the linker must
guarantee that the resulting section will be aligned to the largest alignment required within the
section. This can be true for every loadable section from every source file, so large alignments
should be used conservatively to avoid large gaps in the final load image.

6.4.2 .ascii “string” {, “string”}
Reserves and initializes space for one or more strings given. Each assembled string will not be null-terminated
and will fill consecutive addresses. No alignment is implied.

6.4.3 .asciz “string” {, “string”}
Same as .ascii except the strings will be null terminated.

6.4.4 .byte exp {, exp}
Assembles consecutive bytes with the one or more values given by the expression(s). No alignment is implied.
Values larger than eight bits are truncated to fit into eight bits. This also generates a warning diagnostic.

6.4.5 .comm symbol, length [, align]
Declares an area of length bytes in the .bss section that will be shared by different files. If another file
declares a longer length, then the length will be the maximum of all the declared lengths. The alignment, if
specified, is log2 of the alignment. The value “3” specifies 8-byte alignment. The units are the same as in
the .align directive. If no alignment is specified, the assembler will naturally align the symbol according to
the largest natural type that can be contained in an entity of that size. Entities of eight bytes and larger are
8-byte aligned, entities of four bytes are 4-byte aligned, entities of two and three bytes are 2-byte aligned,
single-byte entities are 1-byte aligned.

6.4.6 .data
Equivalent to:
.section .data,”RW”

6.4.7 .double float {, float}
Assembles floating point values into IEEE 64-bit floating point numbers. The numbers will be consecutive
and no alignment is implied.

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

67

Chapter 6. Assembly syntax and directives

6.4.8 .equ symbol, expression
Sets the value of the symbol to the expression. If the expression value cannot be resolved to an absolute or
relocatable value after all assembler passes are complete, the assembly will be aborted with an error.

6.4.9 .export symbol {, symbol}
Causes the symbol to appear in the emitted symbol table in the resulting object file. The symbol may be
defined within the file or it may be defined within an external file.

6.4.10 .fill count [, size [, value]]
Emits count copies of the value given. Only the least significant size bytes of value are replicated. The size
must be a value ranging from one through eight; the default size is one byte. The default value is zero. All
three arguments are integral absolute expressions.

6.4.11 .float float {, float}
Assembles floating point values into IEEE 32-bit floating point numbers. The numbers will be consecutive
and no alignment is implied.

6.4.12 .ident “string”
Places the string in the .comment section of the object file reserved for identification purposes. This is used
for version tracking and source-to-binary audit trails.

6.4.13 .import symbol {, symbol}
Indicates that the symbols are defined externally from this file. All undefined symbols that are not declared
as imported will cause a warning message to be issued by the assembler. Symbols that have been declared
external but are not referenced should not appear in the symbol table of the emitted object file.

6.4.14 .literals
Causes the assembler’s accumulated literal table for the jmpi, jsri, and lrw instructions for the current section
to be emitted. Can be used by the assembler programmer to flush literal tables at the exact point desired.

6.4.15 .lcomm symbol, length [, alignment]
Reserve length bytes for a named local common area in the .bss section. The allo- cations of symbols in the
.bsssection will be in the same order as the .lcomm statements in the source file.
NOTE
Preserving the allocation order allows the compiler to use fixed offsets from a bss pointer to access
several related variables. The optional alignment value is log2 of the desired alignment; a value
of “3” specifies eight byte alignment. If no alignment is specified, the assembler will naturally
align the symbol according to the largest natural type that can be contained in an entity of that

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

68

Chapter 6. Assembly syntax and directives

size. Entities of eight bytes and larger are 8-byte aligned, entities of four bytes are 4-byte aligned,
entities of two and three bytes are 2-byte aligned, single-byte entities are 1-byte aligned.

6.4.16 .long exp {, exp}
Emits four byte values consecutively.

6.4.17 .section name [, “attributes”]
Assemble subsequent statements onto the end of the named section. Section names obey the same syntax
as symbol names. The attributes supported are the access permissions (read, write, and execute) and the
allocation bits (yes or no). Permissions and allocation are indicated by any combination of the letters
RWXANrwxan with no separators between them. The attributes are specified as a quoted string. The
attribute characters are explained in Table 6.2.
Table 6.2: CKCORE Section Attribute Encodings
Section Attribute Encodings
R or r
Section is to be readable.
W or w
Section is to be writable.
X or x
Section contains executable code.
A or a
Section is to be allocated in the loaded image
N or n
Section is NOT to be allocated in the loaded image
A missing attribute list indicates that the section should have all permissions (RWX) and address space will
be allocated in the load map. An empty attribute list (e.g., an empty quoted string) specifies an allocated
but inaccessible section.
A missing attribute list generates the default permissions.
Multiple specifications of a section take the attributes from the first specification of the section.
.sectionsectionname, ” RX ”
.sectionsectionname, ” RW ”

The RW attribute is ignored and the section sectionname will have read and execute permissions.

6.4.18 .short exp {, exp}
Emits two byte values consecutively.

6.4.19 .text
Equivalent to:
.section.text, ” RX ”

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

69

Chapter 6. Assembly syntax and directives

6.4.20 .weak symbol [, symbol]
Specify a weak external symbol definition. If symbol is not otherwise defined at link time, it has the value
zero. Multiple symbols can be specified on the same line.
The assembler also supports several pseudo-instructions which are expanded into one or more machine
instructions.
Some pseudo-instructions are used to delay selection of instructions until relative addresses are resolved. For
example, a smaller relative branch instruction could be emitted instead of a larger absolute jump instruction
if the decision is delayed until the branch distance is known.
Some pseudo-instructions are for the assembler programmers convenience. For example, the “clear the
condition bit” (clrc) instruction is another mnemonic for a compare of r0 being not equal to r0. Also, the
mnemonics for the load/store instructions (ldb, ldh, ldw, stb, sth, stw) have alternate forms (ld.b, ld.h, ld.w,
st.b, st.h, st.w). Other pseudo-instructions are used to get C-SKY V2.0 compatible with V1.0, for example,
“movt” does exist in V2.0 instruction set, but can be replaced by “inct”.

6.5 Pseudo-Instructions
The assembler also supports several pseudo-instructions (as showed in Table 6.3) which are expanded into
one or more machine instructions.
Some pseudo-instructions are used to delay selection of instructions until relative addresses are resolved. For
example, a smaller relative branch instruction could be emitted instead of a larger absolute jump instruction
if the decision is delayed until the branch distance is known.
Some pseudo-instructions are for the assembler programmers convenience. For example, the “clear the
condition bit” (clrc) instruction is another mnemonic for a compare of r0 being not equal to r0. Also, the
mnemonics for the load/store instructions (ldb, ldh, ldw, stb, sth, stw) have alternate forms (ld.b, ld.h, ld.w,
st.b, st.h, st.w). Other pseudo-instructions are used to get C-SKY V2.0 compatible with V1.0, for example,
“movt” does exist in V2.0 instruction set, but can be replaced by “inct”.

Pesudo
clrc
cmplei rd,n

cmpls rd,rs

cmpgt rd,rs

jbsr label

Table 6.3: C-SKY V2 CPU Pseudo Instructions
opcode
description
cmpne r0,r0
clear the C bit
cmplti rd, n+1
checking if rd is less
than or equal n of signed
type.
cmphs rs, rd
checking if rd is less
than rs of unsigned
type.
cmplt rs, rd
checking if rd is greater
than rs of unsigned
type.
abiv1:
jump to sub-routine
bsr label
or
jsri label
abiv2:
bsr label

CPU
all
all

all

all

all

Continued on next page

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

70

Chapter 6. Assembly syntax and directives

Pesudo
jbr label

jbf label

jbt label

rts
neg rd

rotlc rd,1
rotri rd,imm
setc
tstle rd
tstlt rd
tstne rd
bgeni rz,imm
ldq r4-r7,(rx)

Table 6.3 – continued from previous page
opcode
description
abiv1:
unconditional jump
br label
or
jmpi label
abiv2:
br label
abiv1:
jump to the specified
bf label
sub-procedure if C bit is
or
zero
bt 1f
jmpi label
1:…
abiv2:
bf label(16/32 bits)
or
bt 1f (16 bits)
br/jmpi label(32 bits)
1:…
abiv1:
jump when C bit is one
bt label
or
bf 1f
jmpi label
1:…
abiv2:
bt label(16/32 bits)
or
bf 1f(16 bits)
br/jmpi label(32 bits)
1:…
jmp r15
return
from
subprocedure
abiv1:
negate the specified
rsubi rd,0
number
abiv2:
not rd, rd
addi rd, 1
addc rd,rd
addition with carry bit
rotli rd,32-imm
circlly rotate immediate
cmphs r0,r0
set the C bit
cmplti rd,1
checking on if value isn’t
positive
btsti rd,31
checking on if value is
positive
cmplnei rd,0
checking on if value isn’t
zero
movi rz,immpow
set n-th of value as 1,
immpow is 2 power imm other as 0.
ldm r4-r7,(rx)
r4=(rx,0),r5=(rx,4),
r6=(rx,8),r7=(rx,12)

CPU
all

all

all

all
all

all
all
all
all
all
all
V2.0
V2.0
Continued on next page

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

71

Chapter 6. Assembly syntax and directives

Pesudo
stq r4-r7,(rx)
mov rz,rx

movf rz,rx
movt rz,rx
not rz,rx
rsub rz,rx,ry
rsubi rz,rx,ry
sextb rz,rx

sexth rz,rx

zextb rz,rx

zexth rz,rx

lrw rz,imm32
jbez rx,label

jbnez rx,label

jbhz rx,label

jblsz rx,label

Table 6.3 – continued from previous page
opcode
description
stm r4-r7,(rx)
(rx,0)=r4,(rx,4)=r5,
(rx,8)=r6,(rx,12)=r7
mov rz,rx
rz=rx
or
result is mov if both of
lsli rz,rx,0
rz and rz are among r0
to r15. otherwise, result
is lsli.
incf rz,rx,0
move rx to rz if C bit is
0
inct rz,rx,0
move rx to rz if C bit is
1
nor rz,rx,rx
not the rx and move
reuslt to rz
subu rz,ry,rx
rz=ry-rx
movi r1,imm16
rz=imm16-rx
subu rx,r1,rx
sext rz,rx,7,0
signed extending of first
byte of rx and move it to
rz.
sext rz,rx,15,0
signed extending of first
word of rx and move it
to rz.
zext rz,rx,7,0
zero extending of first
byte of rx and move it
to rz.
zext rz,rx,15,0
zero extending of first
word of rx and move it
to rz.
movih rz,imm32_hi16
load an 32 bits immediori rz�rz,imm32_lo16
ate number to register
bez rx,label
jump to sub-procedure
or
if rx == 0
bnez rx,1f
br/jmpi label(32 bits)
1:…
bnez rx,label
jump to sub-procedure
or
if rx != 0
bez rx,1f
br/jmpi label(32 bits)
1:…
bhz rx,label
jump to sub-procedure
or
if rx > 0
blsz rx,1f
br/jmpi label(32 bits)
1:…
blsz rx,label
jump to sub-procedure
or
if rx <= 0
bhz rx,1f
br/jmpi label(32 bits)
1:…

CPU
V2.0
V2.0

V2.0
V2.0
V2.0
V2.0
V2.0
V2.0

V2.0

V2.0

V2.0

V2.0
v2.0

v2.0

v2.0

v2.0

Continued on next page

Release 2.1

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

72

Chapter 6. Assembly syntax and directives

Pesudo
jblz rx,label

jbhsz rx,label

Release 2.1

Table 6.3 – continued from previous page
opcode
description
blz rx,label
jump to sub-procedure
or
if rx < 0
bhsz rx,1f
br/jmpi label(32 bits)
1…
bhsz rx,label
jump to sub-procedure
or
if rx >= 0
blz rx,1f
br/jmpi label(32 bits)
1…

CPU
v2.0

v2.0

Copyright © 2018 Hangzhou C-SKY MicroSystems Co.,Ltd. All rights reserved.

73

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Mode                       : UseOutlines
Page Count                      : 78
Creator                         : LaTeX with hyperref package
Title                           : C-SKY V2 CPU Applications Binary Interface Standards Manual
Author                          : csky
Producer                        : XeTeX 0.99998
Create Date                     : 2018:11:15 17:12:23+08:00

EXIF Metadata provided by EXIF.tools

C SKY V2 CPU Applications Binary Interface Standards Manual

Navigation menu

Versions of this User Manual:

Views

Navigation