Lab4 Instructions

Lab4_Instructions

User Manual:

Open the PDF directly: View PDF .
Page Count: 11

Page 1 of 11

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

UNIVERSITY OF BRITISH COLUMBIA

CPEN 311 – Digital Systems Design

Winter 2018 Term 2

Lab 4: Memory, Scheduling, and Decryption

Overview

This lab was originally written by Prof. Steve Wilton for CPEN311 in Summer 2015 Term 2 and perhaps also used

for earlier terms. It is essentially unchanged from Prof. Wilton's lab except that I have changed the grading

scheme.

Additionally, and read this carefully, for this lab and for this lab only, I am allowing you to work in groups of 2

(group MUST be formed people from the same lab section - otherwise it will be a logistical nightmare for the

TAs). However, if you do decide to work in groups of 2, there is a price for this, as follows: You must implement

the lab using modular design with multiple FSMs that communicate with each other using the start-finish protocol

or another standardized protocol that you can explain and justify. I recommend doing this anyway, always, even if

you are working alone. But for those working in groups of 2, this is mandatory.

Prof. Wilton solved this lab using just one FSM that he incrementally made more capable. I solved it using the

modular FSM approach which resulted in 5 FSMs being written. Both approaches are valid as long as the FSMs

are built reliably. If you choose to solve this lab as a 1-person lab group like other labs, then you are free to choose

your implementation method. If you are part of a 2-person group, this means that you must choose the modular

FSM approach in order to be able to efficiently divide the work among team members. I am doing this because I

want to promote modular design, standardized protocols between FSMs, and team design as part of this course.

In this lab, you will get experience creating a design that contains several on-chip memories. The circuit you will

create is an RC4 Decryption circuit. RC4 is a popular stream cypher, and until recently was widely used in

encrypting web traffic among other uses. RC4 has now been deemed insecure and has been replaced by several

variants. Still, RC4 is an important algorithm and provides a good vehicle for studying digital circuits that made

extensive use of on-chip memory. It also provides a basis for implementing some of these variants that are more

secure.

In this lab, you will first design an RC4 decryption circuit. The secret key will be obtained from a bank of

switches on your board, and the encrypted message will be given to you as a ROM initialization file. In Task 1

and 2, you will build the basic decryption circuit. In Task 3, you will extend this to build an RC4 cracking circuit;

the circuit will implement a ‘brute-force’ attack on RC4 by cycling through the entire keyspace and stopping when

a successful decryption is performed. Those of you that want to go further can consider building multiple

functional units, each of which cycles through a portion of the keyspace in parallel for faster cracking (Bonus

Task).

Do not be fooled by the complexity of this lab. Although it is more complicated than previous labs, we are still

talking about an algorithm that can be expressed in a flowchart (or, in the case of the modular FSM approach,

multiple flowcharts). You already know from the lectures and Lab 2 how to reliably convert any proper FSM

flowchart to an FSM. In essence, therefore, this is not fundamentally different from Lab 2 except that the

algorithm and flowcharts will be much more complicated.

Page 2 of 11

Background: RC4 Decryption

This section describes the RC4 decryption algorithm. You can find more information by doing a Google Search,

but the information here should be sufficient to complete this lab. Interestingly, the same algorithm is used for

both encryption and decryption, but we will only use it for decryption in this lab.

RC4 is a stream cipher. Based on a key, the algorithm generates a series of bytes. Each byte is XOR’ed with one

byte of a message to produce the decoded text.

The basic RC4 algorithm is shown on the following page:

Page 3 of 11

// Input:

// secret_key [] : array of bytes that represent the secret key. In our implementation,

// we will assume a key of 24 bits, meaning this array is 3 bytes long

// encrypted_input []: array of bytes that represent the encrypted message. In our

// implementation, we will assume the input message is 32 bytes

// Output:

// decrypted _output []: array of bytes that represent the decrypted result. This will

// always be the same length as encrypted_input [].

// initialize s array. You will build this in Task 1

for i = 0 to 255 {

s[i] = i;

}

// shuffle the array based on the secret key. You will build this in Task 2

j = 0

for i = 0 to 255 {

j = (j + s[i] + secret_key[i mod keylength] ) mod 256 //keylength is 3 in our impl.

swap values of s[i] and s[j]

}

// compute one byte per character in the encrypted message. You will build this in Task 2

i = 0, j=0

for k = 0 to message_length-1 { // message_length is 32 in our implementation

i = (i+1) mod 256

j = (j+s[i]) mod 256

swap values of s[i] and s[j]

f = s[ (s[i]+s[j]) mod 256 ]

decrypted_output[k] = f xor encrypted_input[k] // 8 bit wide XOR function

}

However, if you define the variables i, j, and f as 8-bit registers, and since s[i] and s[j] are also 8-bit numbers, then

due to the "automatic" modulo-256 that we get by having an 8-bit number, we can drop all the "mod 256" and

simply write:

// initialize s array. You will build this in Task 1

for i = 0 to 255 {

s[i] = i;

}

// shuffle the array based on the secret key. You will build this in Task 2

j = 0

for i = 0 to 255 {

j = (j + s[i] + secret_key[i mod keylength] ) //keylength is 3 in our impl.

swap values of s[i] and s[j]

}

// compute one byte per character in the encrypted message. You will build this in Task 2

i = 0, j=0

for k = 0 to message_length-1 { // message_length is 32 in our implementation

i = i+1

j = j+s[i]

swap values of s[i] and s[j]

f = s[ (s[i]+s[j]) ]

decrypted_output[k] = f xor encrypted_input[k] // 8 bit wide XOR function

}

This is a significant simplification since the modulo operations could be synthesized by the compiler using

divisions, which we know takes up a lot of resources.

Page 4 of 11

The length of the secret key can vary in different applications, but is typically 40 bits (8 bytes). In our

implementation, we will assume a 24 bit (3 byte) key. We are using a smaller key to ensure that you can “crack”

the implementation in Task 3 in a reasonable amount of time. Note that in the second loop above, secret_key[0]

refers to the first byte of the key, secret_key[1] refers to the second byte of the secret key, and secret_key[2] refers

to the third byte of the secret key. This is illustrated below; in this example, the 24-bit secret key is

000000110101111100111100 = x035F3C. The diagram shows how each of secret_key[0], secret_key[1], and

secret_key[2] are found.

secret_key[0] = 00111011 = x03 = 3

000000110101111100111100

secret_key[1] = 01011111 = x5F = 95

secret_key[2] = 00111100 = x3C = 60

The encrypted message (the input) consists of 32 bytes, and in the above pseudo-code, encrypted_input[k] refers

to the kth byte in the encrypted input. The decrypted message (the output) will also be 32 bytes; in the above

pseudo-code, decrypted_output[k] refers to the kth byte in the encrypted output.

Page 5 of 11

Task 1: Creating a memory, instantiating it, and writing to it (20% of functionality)

To get started, download the template and open the Quartus project. In this task, you will get started by creating a

RAM using the Megafunction Wizard, creating circuitry to fill the memory, and observing the memory contents

using the In-System Memory Content Editor.

a) Creating a RAM block using the Wizard

(If you have a DE2 and are using Quartus 13.0sp1) Choose Tools->MegaWizard Plug In Manager. Choose

to create a new custom megafunction variation. In the left panel, expand Memory Compiler, and choose RAM:

1-Port.

(If you have a DE1-SoC and are using Quartus 14.1) Choose Tools->IP Catalog. In the IP Catalog, select

Basic Functions->On Chip Memory->RAM: 1-Port.

Create an output file called s_memory.vhd in your project directory and hit next. In the next few panels,

customize your Megafunction as indicated below.

How wide should the ‘q’ output bus be? 8 bits

How many 8-bit words of memory? 256 words

What should the memory block type be? M4K (DE2), or M10K (DE1-SoC)

What clocking method would you like to use? Single clock

Which ports should be registered: Make sure ‘q’ output port is unselected

Create one clock enable signal… : do not select

Create an ‘aclr’ asynchronous clear… : do not select

Create a ‘rden’ read enable… : do not select

(DE1-SoC only) What should the q output be…: New Data

Do you want to specify the initial contents? No

Allow In-System Memory Content Editor to capture and update.. Select this option

The ‘Instance ID’ of this RAM is S

Generate netlist: do not select

Output files: select s_memory.cmp (VHDL Component declaration file)

Do you want to add the Quartus II File to the project? Yes

When you finish this, you will find the file s_memory.qip in your project file list. Click on the triangle beside

s_memory.qip to expand, and you will see s_memory.vhd. Open this file and examine it. Near the top of your

file, you will find the Entity declaration for s_memory, which will look something like this:

Be sure your declaration matches the above. This is the entity you will include as a component in your design.

b) Setting Assignment (if you are using a DE1-SoC, skip this step)

Because you are going to use the In-System Memory Content Editor, you must make one additional setting in your

project. Choose Assignments->Settings and open Analysis And Synthesis Settings->Default Parameters.

Page 6 of 11

Type the parameter name CYCLONEII_SAFE_WRITE and assign the value RESTRUCTURE as shown in the

following diagram (be sure to spell it exactly right). Click ADD and then OK.

c) Creating a VHDL module that writes to your memory

Look at the top-level file ksa.vhd. You are to add code to implement the following algorithm (this is the very first

part of RC4).

for i = 0 to 255 {

s[i] = i;

}

If you see a message something like

“Error (15684): M4K memory block WYSIWYG primitive

"s_memory:u0|altsyncram:altsyncram_component|altsyncram_5ee1:auto_generated|altsyncram_l2a2:alts

yncram1|ram_block3a1" utilizes the dual-port dual-clock mode. However, this mode is not supported in

Cyclone II device family in this version of Quartus II software. Please refer to the Cyclone II FPGA

Family Errata Sheet for more information on this feature.

Then it is probably because you either forgot step (b) above, or else mistyped the name or default setting. Open

the settings again and verify that it is correct. Be sure your design compiles without errors before going on.

d) Running your code.

Program your design and run it as usual. The circuit will fill the on-chip memory with the numbers 0-255. To

examine the contents of the memory, you can choose Tools->In System Memory Content Editor (while your

circuit is running). This will open a window that shows the memory contents. In the instance manager window

(left) you should see a list of memories in your design (in this case, it is only S). Right click S, and choose Read

Data from In-System Memory.

Page 7 of 11

This memory contents will be read and displayed. You should see something like:

As you can see, the memory has been filled. Each location i contains the value i. Be sure that your program has

filled the memory as expected. You will use the In-System Memory Content Editor to help you debug your lab in

Task 2, so be sure you are able to get this working before moving on.

Task 2: Building a Single Decryption Core (40% of functionality)

In this task, you are to continue your design from Task 1 to implement a single decryption core. Given a 24 bit

key, and an encrypted message in a ROM, your algorithm will decrypt the message and store the result in another

memory. You will then use the In-System Memory Content Editor to read the result to ensure the encryption

occurred as expected.

As shown below, your system will consist of three memories and a state machine/datapath. The initial encrypted

message is stored in a 32-word ROM (which you will initialize using a .mif file when you compile the design).

The result is stored in a 32-word RAM (which you can examine using the In-System Memory Content Editor after

the decryption is complete. In addition, you will use a 256x8 bit S memory (the same memory you used in Task

2). The secret key is set using the slider switches. Note that the secret key should be 24 bits long, but you only

have 18 slider switches on the DE2 board (and only 10 on the DE1-SoC board). For this task only (not Task 3),

you can hardwire the upper 14 bits of the secret key to 0.

The diagram below shows the design of your system. Although it shows bits 23-18 of the key are “0” and the

switches are used for bits 17-0, you should set bits 23-10 to “0” and just use bits 9-0 from the switches.

Page 8 of 11

Encrypted

Message

32x8 ROM

Decrypted

Message

RAM

32x8

Working Memory

RAM (S)

(256x8)

Your

State Machine/

Datapath

address_d

data_d

wren_d

q_d

address

data

wen

qq_m

address_m

secret_key

(24 bits)

"000000"

bit 0

SW(0)

SW(1)

SW(2)

SW(3)

SW(4)

SW(5)

SW(6)

SW(7)

SW(8)

SW(9)

SW(10)

SW(11)

SW(12)

SW(13)

SW(14)

SW(15)

SW(16)

SW(17)

bit 1

bit 2

bit 3

bit 4

bit 5

bit 6

bit 7

bit 8

bit 9

bit 10

bit 11

bit 12

bit 13

bit 14

bit 15

bit 16

bit 17

bits 18-23

The encrypted message we

give you goes here (the input)

The result

goes here

Working memory

Task 2a) Second Loop in algorithm

Starting with your Task 2 code, add hardware to implement the second loop from the algorithm on the second page

of this handout. That is, add code to implement the following (remember what I said about being able to omit

the "mod 256" operation in the "Background" section if you define the relevant quantities as 8-bit wide.

However you still need to implement "i mod keylength". The Verilog keyword for mod is "%")

j = 0

for i = 0 to 255 {

j = (j + s[i] + secret_key[i mod keylength] ) mod 256 //keylength is 3 in our impl.

swap values of s[i] and s[j]

}

This code does not use the encoded message ROM or the decoded message RAM (you will add that in part b

below). Test your code as follows. Set the secret key to 00000000 00000010 01001001 = x000249 (remember

you can set the lower order 10 bits of secret key using the slider switches as in the above diagram) and examine

the s array using the in-system memory content editor as before. Note that in this example secret_key [0] = 0x0,

secret_key [1] = 0x2, secret_key[2] = 0x49. Do not mix up the order of these or you will not get the right results!

You should see the following. If you do, your code is likely correct; if not, you have some debugging to do.

Don’t move on until you have this right.

Page 9 of 11

Task 2b) Rest of algorithm

Now you will complete the rest of the algorithm (the third for loop in the pseudo-code on Page 2 of this handout).

This will require adding two memories: one to store the encrypted message (this could be a ROM since the

message to decrypt will be compiled with your hardware) and one to store the decrypted message (this could be a

RAM since your circuit must write the result to it). To create the ROM and RAM, it is suggested that you use the

MegaWizard Plug In Manager (as you did for Task 2).

When creating a ROM using the MegaWizard Plug-In Manager (Quartus 13.0) or IP Library (Quartus 14.1), you

will have the ability to indicate an initialization file name. Use the name message.mif. There are example

message.mif files included in your project directory (under the folder secret_messages); you will need to make

sure this file is in your working directory before you create your component with the MegaWizard Plug In

Manager. During compilation, the contents in this file is then read (notice: it is read at compile time), and is used

as the initial contents of the memory. There are several other settings you will need to specify in the MegaWizard

Plug In Manager; you should know enough now that you can figure out what to use for each of these settings.

Test your design. If you decrypt the first message (in secret_messages/msg_1_for_task2b) (with secret key

00000000 00000010 01001001 = x000249) you should see the following if you use the In System Memory

Content editor to view the contents of the Decrypted Message RAM after your algorithm completes (note: not the

s memory; that will continue to contain pseudo-random bytes). As you can see, the decrypted message is an

ASCII string that you can read.

Check the other messages in the folder (they each have their own secret key). For each, you will need to replace

message.mif in your working directory. Due to a bug in Quartus II, you must also delete the db directory in

your working directory before recompiling (the db directory acts as a cache, and caches your most recent

message.mif file). Remember that the contents of the initial memory contents file is read at compile time, not run

time. Of course, the slider switches are read at run-time. Be sure to use the slider switches to set the appropriate

secret key (each of the three messages has its own secret key).

Common error: Watch out for a warning such as:

Critical Warning (127003): Can't find Memory Initialization File or Hexadecimal (Intel-Format) File

./db/message_mod.mif -- setting all initial values to 0

If you see this warning, it did not find your message.mif file during compilation. Make sure it is in your project

directory, and that you specified it properly when you made your component.

When you can read all the message, you can move on to Task 3. Be sure to back up your code; if you don’t get

Task 3 working, you can get part marks for demoing what you have here.

Page 10 of 11

Task 3: Cracking RC-4 (40% of functionality)

In this task, you are going to modify your design from Task 2 to “crack” a message that has been encrypted using

RC4. In this case, you have the encrypted message, but you do not know the key. You will implement a brute-

force algorithm which cycles through all possible keys. For each key, if the result string is a readable message,

then you will assume the string is correct. More precisely, your circuit will search through the key space until it

finds a result in which, for every character in the decoded output, the character is in the range [97,122]

(corresponding to the characters ‘a’ to ‘z’) or the value 32 (which is a space). All characters in the correct output

will either be a lower case letter or a space; the correct message will contain no upper character letters or

punctuation. Note that it is not enough that just one character is a lower case letter or space; all 32 characters in the

decoded output must be lower case letters or a space before you deem it correct. Of course, there is a chance that

even though a key leads to an ASCII string, the string is not actually correct, but if you work out the math you will

see that the odds of this are extremely low.

Prof. Wilton's implementation of this Task takes about 10 minutes to cycle through the entire key space (I haven't

timed my solution over the entire key space, but it will likely take longer since my solution is slower than Prof.

Wilton's). To make your life easier in the lab, I will tell you that, for every test case we will give you, the two most

significant bits in the correct key are 0. This reduces the size of the keyspace you have to search, meaning you

should be able to search the entire keyspace in a few minutes.

When your circuit finds a correct message, it should put the message in the result RAM and enter an endless loop

in which it does nothing. You can use the In System Memory Content editor to view the output to make sure it is

correct. You should also turn on an LED to indicate that the search has complete. If you search the entire

keyspace and do not find a correct message, then you should turn on a different LED.

The key that is being considered should be displayed on the HEX displays. The project contains a module in

SevenSegmentDisplayDecoder.v which can be used for this purpose. The input to this module is a 4-bit value, and

the output is a 7-bit value which is used to drive the 7-segment display on the DE2/DE1-SoC board. Since your

key is 24 bits, it will require 6 digits of display. Use HEX0 to HEX5. As your design quickly searches through

the keyspace, the HEX displays should be updated in real time, and show the key that is currently being

considered.

There are encoded messages available within the project directory, in the secret_messages/msg_*_for_task3

folders. You can test your design on these messages; you should be able to “crack” each test case. For your

demonstration, the TA will give you a new encoded message that you have never seen before, so make sure your

design works before demoing!

Challenge Task: Multi-Core Cracking (this qualifies for the full 15% Creativity Bonus)

In the challenge task, you will accelerate the keyspace search by using multiple instantiations of the decryption

circuit (core) in Task 3, and using each one to search a subset of the keyspace simultaneously. For example, if you

have two cores, one core could search all even keys and one could search all odd keys. If you have 4 cores, each

could search ¼ of the search space. When one unit finds a correct message, all units should stop.

For the challenge task, modify your design so that it contains at least four decryption cores that operate

simultaneously.

Page 11 of 11

Marking: What to demo depends on how far you get.

If you get the challenge task done, demo that, and explain to the TA how you parallelized the code across multiple

cores. The TA will likely give you a new encrypted message for you to solve. If your implementation is

successful, you do not need to demo Tasks 2 or 3.

If you get Task 3 done, but not the challenge task, demo a working Task 3. The TA will likely give you a new

encrypted message for you to solve. In this works, you do not need to demo Task 2.

If you only get Task 2 done, demo that. If you have Task 2 plus parts of Task 3 done (but not a complete working

system), demo Task 2 and show the TA what you have for Task 3 for possible part marks.

If you only get Task 1 done, demo that. If you have Task 1 plus parts of Task 2 done (but not a complete working

system), demo Task 1 and show the TA what you have for Task 2 for possible part marks.

If you didn’t get Task 1 done, show the TA what you have for possible part marks.

Specific grading requirements for this lab

To receive full points for simulation, you need to simulate all FSMs that you write, and demonstrate/explain those

simulations to the TA.

Do not forget the rules of good design:

- Always design while thinking about the hardware

implementation

- Use simple structures

- Incremental design and test

- Modular design: divide the work, work in

parallel in the group

- Use the RTL viewer

- Verification and test: Use simulations,

LEDs, 7-segments, SignalTap, LCD scope (or

SignalTap equivalent)

- Write Clean, Neat, and Legible code

- Give meaningful names to variables, use comments

- Go over and understand the warnings given by

Quartus during compilation

- The circuit should be correct by design, not just

by simulation.

- The code should be verifiable by inspection

- Design in a modular fashion, always thinking

about future reutilization of the module.

- Re-use proven modules, instead of re-inventing

the wheel all the time

Lab4 Instructions

Lab4_Instructions

Lab4_Instructions

Navigation menu

Versions of this User Manual:

Views

Navigation