Xilinx Computer Hardware XAPP721 User Manual

Application Note: Virtex-4 Series  
High-Performance DDR2 SDRAM  
R
Interface Data Capture Using ISERDES  
and OSERDES  
XAPP721 (v1.3) February 2, 2006  
Author: Maria George  
Summary  
This application note describes a data capture technique for a high-performance DDR2  
SDRAM interface. This technique uses the Input Serializer/Deserializer (ISERDES) and Output  
Serializer/Deserializer (OSERDES) features available in every Virtex™-4 I/O. This technique  
can be used for memory interfaces with frequencies of 267 MHz (533 Mb/s) and above.  
Introduction  
A DDR2 SDRAM interface is source-synchronous where the read data and read strobe are  
transmitted edge-aligned. To capture this transmitted data using Virtex-4 FPGAs, either the  
strobe or the data can be delayed. In this design, the read data is captured in the delayed  
strobe domain and recaptured in the FPGA clock domain in the ISERDES. The received serial,  
double data rate (DDR) read data is converted to 4-bit parallel single data rate (SDR) data at  
half the frequency of the interface using the ISERDES. The differential strobe is placed on a  
clock-capable IO pair in order to access the BUFIO clock resource. The BUFIO clocking  
resource routes the delayed read DQS to its associated data ISERDES clock inputs. The write  
data and strobe transmitted by the FPGA use the OSERDES. The OSERDES converts 4-bit  
parallel data at half the frequency of the interface to DDR data at the interface frequency. The  
controller, datapath, user interface, and all other FPGA slice logic are clocked at half the  
frequency of the interface, resulting in improved design margin at frequencies of 267 MHz and  
above.  
Clocking  
Scheme  
The clocking scheme for this design includes one digital clock manager (DCM) and two phase-  
matched clock dividers (PMCDs) as shown in Figure 1. The controller is clocked at half the  
frequency of the interface using CLKdiv_0. Therefore, the address, bank address, and  
command signals (RAS_L, CAS_L, and WE_L) are asserted for two clock cycles (known as  
"2T" timing), of the fast memory interface clock. The control signals (CS_L, CKE, and ODT) are  
twice the rate (DDR) of the half frequency clock CLKdiv_0, ensuring that the control signals are  
asserted for just one clock cycle of the fast memory interface clock. The clock is forwarded to  
the external memory device using the Output Dual Data Rate (ODDR) flip-flops in the Virtex-4  
I/O. This forwarded clock is 180 degrees out of phase with CLKfast_0. Figure 2 shows the  
command and control timing diagram.  
© 2005 – 2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc.  
All other trademarks are the property of their respective owners.  
March 2006  
Memory Interfaces Solution Guide  
55  
 
R
Write Datapath  
D1  
D2  
DQ  
Write  
Data  
Words  
0-3  
D3  
D4  
OSERDES  
CLKDIV  
CLK  
CLKdiv_90  
CLKfast_90  
IOB  
ChipSyncTM Circuit  
X721_03_080305  
Figure 3: Write Data Transmitted Using OSERDES  
CLKfast_0  
CLKfast_90  
Clock Forwarded  
to Memory Device  
Command  
WRITE  
IDLE  
Control (CS_L)  
Strobe (DQS)  
Data (DQ), OSERDES Output  
D0 D1 D2 D3  
X721_04_120505  
Figure 4: Write Strobe (DQS) and Data (DQ) Timing for a Write Latency of Four  
March 2006  
Memory Interfaces Solution Guide  
57  
 
R
Write Datapath  
Write Timing Analysis  
Table 1 shows the write timing analysis for an interface at 333 MHz (667 Mb/s).  
Table 1: Write Timing Analysis at 333 MHz  
Uncertainties Uncertainties  
Uncertainty Parameters  
Value  
Meaning  
before DQS  
after DQS  
T
T
3000  
150  
Clock period.  
CLOCK  
150  
150  
Duty-cycle distortion from memory DLL is  
subtracted from clock phase (equal to half  
the clock period) to determine  
MEMORY_DLL_DUTY_CYCLE_DIST  
DATA_PERIOD  
T
DATA_PERIOD.  
T
1350  
Data period is half the clock period with 10%  
duty-cycle distortion subtracted from it.  
T
T
T
100  
175  
30  
100  
0
0
Specified by memory vendor.  
Specified by memory vendor.  
SETUP  
175  
30  
HOLD  
30  
PCB trace delays for DQS and its  
PACKAGE_SKEW  
associated DQ bits are adjusted to account  
for package skew. The listed value  
represents dielectric constant variations.  
T
T
T
50  
50  
50  
50  
50  
50  
Same DCM used to generate DQS and DQ.  
Global Clock Tree skew.  
JITTER  
CLOCK_SKEW-MAX  
CLOCK_OUT_PHASE  
140  
140  
140  
Phase offset error between different clock  
outputs of the same DCM.  
T
50  
50  
50  
Skew between data lines and the  
associated strobe on the board.  
PCB_LAYOUT_SKEW  
Total Uncertainties  
420  
420  
495  
855  
435  
Start and End of Valid Window  
Final Window  
Final window equals 855 – 420.  
Notes:  
1. Skew between output flip-flops and output buffers in the same bank is considered to be minimal over voltage and temperature.  
58  
Memory Interfaces Solution Guide  
March 2006  
 
R
Write Datapath  
Controller to Write Datapath Interface  
Table 2 lists the signals required from the controller to the write datapath.  
Table 2: Controller to Write Datapath Signals  
Signal  
Width  
Signal Name  
Signal Description  
Notes  
ctrl_WrEn  
1
Output from the controller to the write Asserted for two CLKDIV_0 cycles for a burst length  
datapath.  
of 4 and three CLKDIV_0 cycles for a burst length of  
8.  
Write DQS and DQ generation  
begins when this signal is asserted. Asserted one CLKDIV_0 cycle earlier than the  
WRITE command for CAS latency values of 4 and  
5.  
Figure 5 and Figure 6 show the timing relationship  
of this signal with respect to the WRITE command.  
ctrl_wr_disable  
1
Output from the controller to the write Asserted for one CLKDIV_0 cycle for a burst length  
datapath.  
of 4 and two CLKDIV_0 cycles for a burst length of  
8.  
Write DQS and DQ generation ends  
when this signal is deasserted.  
Asserted one CLKDIV_0 cycle earlier than the  
WRITE command for CAS latency values of 4 and  
5.  
Figure 5 and Figure 6 show the timing relationship  
of this signal with respect to the WRITE command.  
ctrl_Odd_Latency  
1
Output from controller to write  
datapath.  
Asserted when the selected CAS  
latency is an odd number, e.g., 5.  
Required for generation of write DQS  
and DQ after the correct write  
latency (CAS latency – 1).  
March 2006  
Memory Interfaces Solution Guide  
59  
 
R
Write Datapath  
CLKdiv_0  
Clock Forwarded  
to Memory Device  
CLKdiv_90  
CLKfast_90  
Command  
WRITE  
IDLE  
Control (CS_L)  
ctrl_WrEn  
ctrl_wr_disable  
User Interface Data  
FIFO Out  
D0,D1,D2,D3  
OSERDES Inputs D1, D2, D3, D4  
OSERDES Inputs T1, T2, T3, T4  
Strobe (DQS)  
X,X,D0,D1  
1,1,0,0  
D2,D3,X,X  
0,0,1,1  
Data (DQ), OSERDES Output  
D0 D1 D2 D3  
X721_05_080205  
Figure 5: Write DQ Generation with a Write Latency of 4 and a Burst Length of 4  
CLKdiv_0  
CLKfast_0  
Clock Forwarded  
to Memory Device  
CLKdiv_180  
Command  
WRITE  
IDLE  
Control (CS_L)  
ctrl_WrEn  
ctrl_wr_disable  
OSERDES Inputs D1, D2, D3, D4  
OSERDES Inputs T1, T2, T3, T4  
Strobe (DQS), OSERDES Output  
0, 0, 0, 0  
1, 1, 1, 0  
0, 1, 0, 1  
0, 0, 0, 0  
0, 0, 0 ,0  
0, 1, 1, 1  
X721_06_080205  
Figure 6: Write DQS Generation for a Write Latency of 4 and a Burst Length of 4  
60  
Memory Interfaces Solution Guide  
March 2006  
 
R
Read Datapath  
Read Datapath  
The read datapath comprises the read data capture and recapture stages. Both stages are  
implemented in the built-in ISERDES available in every Virtex-4 I/O. The ISERDES has three  
clock inputs: CLK, OCLK, and CLKDIV. The read data is captured in the CLK (DQS) domain,  
recaptured in the OCLK (FPGA fast clock) domain, and finally transferred to the CLKDIV  
(FPGA divided clock) domain to provide parallel data.  
x
CLK: The read DQS routed using BUFIO provides the CLK input of the ISERDES as  
shown in Figure 7.  
x
OCLK: The OCLK input of ISERDES is connected to the CLK input of OSERDES in  
hardware. In this design, the CLKfast_90 clock is provided to the ISERDES OCLK input  
and the OSERDES CLK input. The clock phase used for OCLK is dictated by the phase  
required for write data.  
x
CLKDIV: It is imperative for OCLK and CLKDIV clock inputs to be phase-aligned for  
correct functionality. Therefore, the CLKDIV input is provided with CLKdiv_90 that is  
phase-aligned to CLKfast_90.  
User Interface  
FIFOs  
DQ  
Delay  
Q1  
Q2  
Read Data  
Word 3  
Read Data  
to Align With  
Strobe and  
FPGA Clock  
Read Data  
Word 2  
Q3  
Q4  
Read Data  
Word 1  
Read Data  
Word 0  
ISERDES  
CLK  
OCLK  
CLKDIV  
BUFIO  
DQS  
Data Delay Value Determined  
Using Training Pattern  
CLKdiv_90  
CLKfast_90  
IOB  
X721_07_063005  
Figure 7: Read Data Capture Using ISERDES  
Read Timing Analysis  
To capture read data without errors in the ISERDES, read data and strobe must be delayed to  
meet the setup and hold times of the flip-flops in the FPGA clock domain. Read data (DQ) and  
strobe (DQS) are received edge-aligned at the FPGA. The differential DQS pair must be placed  
on a clock-capable IO pair in order to access the BUFIO resource. The received read DQS is  
then routed through the BUFIO resource to the CLK input of the ISERDES of the associated  
data bits. The delay through the BUFIO and clock routing resources shifts the DQS to the right  
with respect to data. The total delay through the BUFIO and clock resource is 595 ps in a -11  
speed grade device and 555 ps in a -12 speed grade device.  
March 2006  
Memory Interfaces Solution Guide  
61  
 
R
Read Datapath  
Table 3 shows the read timing analysis at 333 MHz required to determine the delay required on  
DQ bits for centering DQS in the data valid window.  
Table 3: Read Timing Analysis at 333 MHz  
Parameter  
Value (ps)  
Meaning  
T
T
T
3000  
1500  
350  
Clock period.  
CLOCK  
Clock phase for DDR data.  
PHASE  
Sample Window from Virtex-4 data sheet for  
a -12 device. It includes setup and hold for  
an IOB FF, clock jitter, and 150 ps of tap  
uncertainty.  
SAMP_BUFIO  
T
T
100  
580  
BUFIO clock resource duty-cycle distortion.  
BUFIO_DCD  
T
Worst case memory uncertainties that  
include VT variations and skew between  
DQS and its associated DQs. Because the  
design includes per bit deskew, realistically  
only a percentage of this number should be  
considered.  
DQSQ + QHS  
T
150  
0
Duty-cycle distortion.  
MEM_DCD  
Tap Uncertainty  
Tap uncertainty with 75 ps resolution. A  
window detection error of 75 ps can be on  
both ends of the window. This is already  
included in T  
.
SAMP_BUFIO  
Total Uncertainties  
Window  
1180  
320  
Worst-case window.  
Notes:  
1.  
T
is the sampling error over VT for a DDR input register in the IOB when using  
SAMP_BUFIO  
the BUFIO clocking resource and the IDELAY.  
2. All the parameters listed above are uncertainties to be considered when using the per bit  
calibration technique.  
3. Parameters like BUFIO skew, package_skew, pcb_layout_skew, and part of TDQSQ, and  
TQHS are calibrated out with the per bit calibration technique. Inter-symbol interference and  
crosstalk, contributors to dynamic skew, are not considered in this analysis.  
Per Bit Deskew Data Capture Technique  
To ensure reliable data capture in the OCLK and CLKDIV domains in the ISERDES, a training  
sequence is required after memory initialization. The controller issues a WRITE command to  
write a known data pattern to a specified memory location. The controller then issues  
back-to-back read commands to read back the written data from this specified location. The DQ  
bit 0 ISERDES outputs Q1, Q2, Q3, and Q4 are then compared with the known data pattern. If  
they do not match, DQ and DQS are delayed by one tap, and the comparison is performed  
again. The tap increments continue until there is a match. If there is no match even at tap 64,  
then DQ and DQS are both reset to tap 0. DQS tap is set to one, and both DQS and DQ are  
delayed in unit tap increments and the comparison is performed after each tap increment until  
a match is found. With the first detected match, the DQS window count is incremented to 1.  
DQS continues to be delayed in unit tap increments until a mismatch is detected. The DQS  
window count is also incremented along with the tap increments to record the width of the data  
valid window in the FPGA clock domain. DQS is then decremented by half the window count to  
center DQS edges in the center of the data valid window. With the position of DQS fixed, each  
DQ bit is then centered with respect to DQS. The dp_dly_slct_done signal is asserted when the  
centering of all DQ bits associated with its DQS is completed.  
62  
Memory Interfaces Solution Guide  
March 2006  
 
R
Read Datapath  
Figure 8 shows the timing waveform for read data and strobe delay determination. The  
waveforms on the left show a case where the DQS is delayed due to BUFIO and clocking  
resource, and the ISERDES outputs do not match the expected data pattern. The waveforms  
on the right show a case where the DQS and DQ are delayed until the ISERDES outputs match  
the expected data pattern. The lower end of the frequency range useful in this design is limited  
by the number of available taps in the IDELAY block, the PCB trace delay, and the CAS latency  
of the memory device.  
CLKdiv_0  
CLKfast_0  
CLKfast_90  
CLKdiv_90  
DQS @ FPGA  
DQ @ FPGA  
DQS @ FPGA  
DQ @ FPGA  
D0 D1 D2 D3  
D0 D1 D2 D3  
D0 D1 D2 D3  
DQS Delayed by Calibration  
Delay @ ISERDES  
DQS @ ISERDES  
delayed by BUFIO  
and clocking resource  
DQ Delayed by Calibration Delay  
DQ  
D0 D1 D2 D3  
D0 D2  
Correct Data  
Sequence  
D0  
D2  
DQ Captured in DQS Domain  
D1  
D0  
D3  
D2  
D1  
D0  
D3  
D2  
D0  
Input to Q2 Reg  
D1  
D3  
D1  
D1  
D3  
Input to Q1 Reg  
CLKfast_90  
Domain  
D0  
D1  
D2  
D3  
D0  
D1  
D2  
D3  
Input to Q4 Reg  
Input to Q3 Reg  
No Match  
Incorrect Data  
Sequence  
Parallel Data @ ISERDES  
Outputs Q4, Q3, Q2, Q1  
Parallel Data @ ISERDES  
Outputs Q4, Q3, Q2, Q1  
D2,D3,D0,D1  
D0,D1,D2,D3  
X721_08_112905  
Figure 8: Read Data and Strobe Delay  
March 2006  
Memory Interfaces Solution Guide  
63  
 
R
Read Datapath  
Controller to Read Datapath Interface  
Table 4 lists the control signals between the controller and the read datapath.  
Table 4: Signals between Controller and Read Datapath  
Signal  
Width  
Signal Name  
Signal Description  
Notes  
ctrl_Dummyread_Start  
1
Output from the controller to the This signal must be asserted when valid read data  
read datapath. When this signal is available on the data bus.  
is asserted, the strobe and data  
calibration begin.  
This signal is deasserted when the  
dp_dly_slct_done signal is asserted.  
dp_dly_slct_done  
ctrl_RdEn_div0  
1
1
Output from the read datapath This signal is asserted when the data and strobe  
to the controller indicating the  
strobe and data calibration are  
complete.  
have been calibrated.  
Normal operation begins after this signal is  
asserted.  
Output from the controller to the This signal is asserted for one CLKdiv_0 clock  
read datapath used as the write cycle for a burst length of 4 and two clock cycles for  
enable to the read data capture a burst length of 8.  
FIFOs.  
The CAS latency and additive latency values  
determine the timing relationship of this signal with  
the read state.  
Figure 9 shows the timing waveform for this signal  
with a CAS latency of 5 and an additive latency of  
0 for a burst length of 4.  
CLKdiv_0  
CLKfast_0  
CLKdiv_90  
CLKfast_90  
Command  
D0 D1 D2 D3  
READ  
DQ @ Memory Device  
DQS @ Memory Device  
CS# @ Memory  
DQS @ ISERDES CLK Input  
(Round Trip & BUFIO & Calibration Delays)  
ctrl_RdEn_div0  
(Input to SRL16 Clocked  
by CLKdiv_90)  
DQ @ ISERDES Input  
D0 D1 D2 D3  
(Round Trip & Initial Tap Value & Calibration Delays)  
Parallel Data  
D0,D1,D2,D3  
@ ISERDES Output  
srl_out (SRL16 Output)  
Ctrl_RdEn  
(Write_enable to FIFOs Aligned with ISERDES Data Output)  
X721_09_113005  
Figure 9: Read-Enable Timing for CAS Latency of 5 and Burst Length of 4  
64  
Memory Interfaces Solution Guide  
March 2006  
 
R
Reference Design  
The ctrl_RdEn signal is required to validate read data because the DDR2 SDRAM devices do  
not provide a read valid or read-enable signal along with read data. The controller generates  
this read-enable signal based on the CAS latency and the burst length. This read-enable signal  
is input to an SRL16 (LUT-based shift register). The number of register stages required to align  
the read-enable signal to the ISERDES read data output is determined during calibration. One  
read-enable signal is generated for each data byte. Figure 10 shows the read-enable logic  
block diagram.  
srl_out  
ctrl_RdEn  
ctrl_RdEn_div0  
SRL16  
FD  
Number of Register Stages  
Selected During Calibration  
CLKdiv_90  
x721_10_113005  
Figure 10: Read-Enable Logic  
Reference  
Design  
Figure 11 shows the hierarchy of the reference design. The mem_interface_top is the top-level  
module. This reference design is available on the Xilinx website at:  
mem_Interface_top  
main  
infrastructure  
idelay_ctrl  
top  
test_bench  
iobs  
user_interface  
data_path  
ddr2_controller  
backend_rom  
cmp_rd_data  
infrastr_iobs  
controller_iobs  
datapath_iobs  
v4_dqs_iob  
backend_fifos  
rd_data  
data_write  
tap_logic  
addr_gen  
data_gen_16  
data_tap_inc  
idelay_rd_en_io  
v4_dm_iob  
v4_dq_iob  
rd_wr_addr_fifo  
wr_data_fifo_16  
rd_data_fifo  
tap_ctrl  
RAM_D  
X721_11_113005  
Figure 11: Reference Design Hierarchy  
March 2006  
Memory Interfaces Solution Guide  
65  
 
R
Reference Design Utilization  
Reference  
Design  
Utilization  
Table 5 lists the resource utilization for a 64-bit interface including the physical layer, the  
controller, the user interface, and a synthesizable test bench.  
Table 5: Resource Utilization for a 64-Bit Interface  
Resources  
Utilization  
Notes  
Slices  
5861  
Includes the controller, synthesizable test bench, and the user  
interface.  
BUFGs  
6
Includes one BUFG for the 200 MHz reference clock for the  
IDELAY block.  
BUFIOs  
DCMs  
8
1
Equals the number of strobes in the interface.  
PMCDs  
2
ISERDES  
OSERDES  
64  
88  
Equals the number of data bits in the interface.  
Equals the sum of the data bits, strobes, and data mask bits.  
Conclusion  
The data capture technique explained in this application note using ISERDES provides a good  
margin for high-performance memory interfaces. The high margin can be achieved because all  
the logic in the FPGA fabric is clocked at half the frequency of the interface, eliminating critical  
paths.  
Revision  
History  
The following table shows the revision history for this document.  
Date  
Version  
1.0  
Revision  
12/15/05  
12/20/05  
01/04/06  
02/02/06  
Initial Xilinx release.  
Updated Table 1.  
1.1  
1.2  
Updated link to reference design file.  
Updated Table 4.  
1.3  
66  
Memory Interfaces Solution Guide  
March 2006  
 

Weider Home Gym WESY59933 User Manual
Weil McLain Boiler CGS 4E User Manual
Whirlpool Ice Maker ECKMF 281 User Manual
Whirlpool Microwave Oven GH5184XPB3 User Manual
WindChaser Products Air Conditioner PACR12 User Manual
Winegard TV Antenna SS 1000 User Manual
XM Satellite Radio Satellite Radio RoadyTM User Manual
Yamaha CD Player CRW2200 User Manual
Yamaha Electronic Keyboard CVP 301 User Manual
Zanussi Fryer 200176 User Manual