The course is scheduled in the second semester (third
quarter). The series of lectures for the academic year 2017-2018
will start on February 9, 2018 at 13:45. The location
is CR 4A (the classroom has changed;
check MyTimetable for locations of second and next lectures).
A Blackboard page
for this course exists for the distribution of a small part of
the study material. Most of the material for the course can be
directly accessed from the current (open) page. The method to
access Blackboard is registration via Osiris (students of the
University of Twente should know how to register).
Below you can find
a tentative version of the schedule for the lectures in
academic year 2017-2018.
The contents are subject to change.
A yellow background color for the slide release date either
means that the version for the current academic year is available
(when the date mentions 2018) or that the slides have not been
updated for the current academic year. A pink background means
that only last year's slides are available for the moment.
Case study: simultaneous design of processor and compiler
Below, you find the written material for this course. At the moment,
the list also contains material that will be no longer used. As the
course progresses, the list will be cleaned up and extended.
Andraka, R., A Survey of CORDIC Algorithms for FPGA-Based
Computers, 6th International Symposium on Field
Programmable Gate Arrays, Monterey, CA., pp 191-200, (1998). Online copy (only in UT domain).
Aksoy, L., P. Flores and J. Monteiro, A Tutorial on Multiplierless
Design of FIR Filters: Algorithms and Architectures, Circuits,
Systems and Signal Processing, Vol. 33(6), pp 1689-1719, (2014).
copy (only in UT domain).
Anjum, O, T. Ahonen, F. Garzia, J. Nurmi, C. Brunelli and H.
Berg, State-of-the-Art Baseband DSP Platforms for
Software-Defined Radio: A Survey, EURASIP Journal on
Wireless Communication and Networking, Vol. 2011(5). Online
Bhattacharyya, S.S., R. Leupers and P. Marwedel, Software
Synthesis and Code Generation for Signal Processing Systems,
IEEE Transactions on Circuits and Systems---II, Analog and
Digital Signal Processing, Vol. 47(9), (September 2000). Online copy (only in UT domain).
Caption of Figure 6: last subscript of y
should be n-1 instead of n.
Right column of Page 856: Read Figure 9(b)
where 9(a) is mentioned and vice versa.
Contents of Figure 17: In order to be consistent with
next figures, rewrite "x = a - b" and "y = a - b + c * d".
Bouganis, C.S. and G.A. Constantinides, Synthesis of DSP
Algorithms from Infinite Precision Specifications, In: P.
Coussy and A. Morawiec (Eds.), High-Level Synthesis, From
Algorithm to Digital Circuit, Springer, pp 197-214, (2008). Online copy (only in UT domain).
Those interested in a detailed analysis of the probability
density function of the truncation error after
multiplication can consult the followin non-compulsory paper:
Ahmadi, A. and M. Zwolinski, Fixed-Point
Multiplication: A Probabilistic Bit-Pattern View,
Microelectronics Reliability, Vol. 51(4), pp 790-796, (April
2011). Online copy (only in UT domain).
You can skip Seciton 11.3 (2D FIR filters).
Page 203, halfway bottom paragraph: twice add a minus sign to
2's exponent (so 2**n should become 2**-n).
Page 204, Equation 11.10: the "close" parenthesis with
exponent 2 should move to the end of the equation.
Gerez, S.H., S.M. Heemstra de Groot, E.R. Bonsma and M.J.M.
Heijligers, Overlapped Scheduling Techniques for High-Level
Synthesis and Multiprocessor Realizations of DSP Algorithms,
In: J.C. Lopez, R. Hermida and W. Geisselhardt (Eds.), Advanced
Techniques for Embedded System Design and Test, Kluwer Academic
Publishers, Boston, pp 125-150, (1998). Available from the
Restricted Matter section in Blackboard.
You can skip Section 6.5.3 on the efficient computation
of the iteration-period bound.
Gerez, S.H., High-Level Synthesis (Chapter 12) in
VLSI Design Automation, John Wiley and Sons, Chichester,
(1999). Available from the Restricted Matter section in
You can skip Section 12.4.3 on force-directed scheduling.
Goossens, G., D. Lanneer and P. Dytrych, Design of Low
Power Processor Cores using a Retargetable Tool Flow, In:
C. Piguet (Ed.), Low-Power Electronics Design, CRC Press, Boca
He, S. and M. Torkelson, Designing Pipeline FFT Processor
for OFDM (De)modulation, URSI International Symposium on
Signals, Systems and Electronics, ISSSE'98, pp. 257-262, (1998).
Online copy (only in UT domain).
Hewlitt, R.M. and E.S. Swartzlander, Canonical Signed
Digit Representation for FIR Digital Filters, IEEE
Workshop on Signal Processing Systems, SiPS 2000, Lafayette, LA,
pp 416-426, (2000). Online copy (only in
Hofstra, K.L. and S.H. Gerez, Arx: A Toolset for the
Efficient Simulation and Direct Synthesis of High-Performance
Signal Processing Algorithms, International Conference on
High Performance Embedded Architectures and Compilers, Ghent,
Belgium, (January 2007). Online copy.
Kotteri, K.A., A.E. Bell and J.E. Carletta, Quantized FIR
Filter Design Using Compensating Zeros, IEEE Signal
Processing Magazine, pp 60-67, (November 2003). Online copy (only in
Langlois, J.M.P., D. Al-Khalili and R.J. Inkol, Polyphase
Filter Approach for High Performance, FPGA-Based Quadrature
Demodulation, Journal of VLSI Signal Processing, Vol. 32,
pp 237-254, (2002). Online copy (only in UT domain).
Correction for Equation 6: there should not be a factor 2 in
front of h_LP(m).
Loehning, M., T. Hentschel and G. Fettweis, Digital Down
Conversion in Software Radio Terminals, 10th European
Signal Processing Conference, EUSIPCO 2000, pp 1517-1520,
Parhi, K.K., High-Level Algorithm and Architecture
Transformations for DSP Synthesis, Journal of VLSI Signal
Processing, Vol. 9, pp 121-143, (1995). Online copy (only in
You can skip Sections 6 (folding) and 8 (relaxed look-ahead).
Comments on Figure 6. The issue is that unfolding can
improve the processor utilization. The explanation in the
paper is not correct.
The schedule shown in Figure 6(b) is rate optimal
i.e. it repeats at the iteration-period bound (T0min) value of
3. In this period, the total of the computations to be
performed is 9 (4 operations of 2 and 1 of 1) time units. The
lower bound on the number of processors is 3 (=9/3). However,
this bound cannot be met. The reason is that the schedule
needs to repeat every 3 time units. This means that a separate
processor is necessary for each of the operations A to D that
take two time units (a processor that would execute two of
them would require an iteration period of 4). One has an
average processor utilization of 75% (9/12).
Figure 6(c) shows a schedule of the graph after 2-unfolding.
The unfolded graph contains 2 iterations of the original
graph. This schedule is also rate optimal which means that the
2 iterations are executed in 6 time units. The optimal number
of processor in this situation would be again 3 (=18/6). There
now exists a schedule that reaches 100% processor utilization
(the available 6 time units per processor can now be filled
optimally with operations of 2 time units).
In Figure 6(b), the operations A0, B0, C0, D0 and E0 belong
to one iteration. The schedule has an iteration period
of 3 (A1 starts 3 time units after A0, etc.) a latency
of 7 (output on E0) and a span of 8 (end of D0).
In Figure 6(c), the operations A0/A1, B0/B1, C0/C1, D0/D1 and
E0/E1 belong to one iteration. The schedule has an iteration
period of 6 (A2 starts 6 time units after A0, etc.) and
a latency and span of 12 (output on E1).
Comments on Figure 9(a). According to me, two
inequalities are incorrect: r(A2) - r(M1) <= 2 and r(M4) -
r(A3) <= -1.
Park, H.W., H. Oh and S. Ha, Multiprocessor SoC Design
Methods and Tools, IEEE Signal Processing Magazine, pp.
72-79, (November 2009). Online copy (only in UT domain).
Vaidyanathan, P.P., Multirate Digital Filters, Filter
Banks, Polyphase Networks, and Applications: A Tutorial,
Proceedings of the IEEE, Vol. 78(1), pp 56-93, (January 1990). Online copy (only in UT domain).
Voronenko, Y. and M. Pueschel, Multiplierless Multiple
Constant Multiplication, ACM Transactions on Algorithms,
Vol. 3(2), (May 2007). Online copy (only in
Only study Section 1 (until page 6); the rest is optional.
Students who did not follow any of the courses
System-on-Chip Design or
System-on-Chip Design for ES need to become familiar with
the VHDL simulation and synthesis flow.
This amounts to work on
Projects VHD and SYN as described on the
page of the mentioned courses.
The goal is to become familiar with:
The basics of VHDL.
Using Questasim (Modelsim) for VHDL simulation.
Using the scripts for VHDL synthesis with Synopsys,
understanding synthesis reports on performance measures on speed
Performing post-synthesis VHDL simulations.
The students concerned do not need to carry out all exercises. It
is left up to them to spend as much time as needed to achieve the
goals mentioned above. Depending on the students' background 10 to
20 hours are supposed to be sufficient. No reports need to be
delivered. This activity does not contribute to the grading of
The examination of this course will consist of a number of
homework exercises and practical
projects. Students are supposed to participate in teams of two
(working alone is allowed but not recommended). Details will
The projects for the
2017-2018 edition of this course are as given in the table below
(subject to change):
The mark will be based
on the reports and the defense of your work in an oral examination
session. The mark is basically the sum of points obtained for
the projects divided by 10:
FINAL = (TRA + ARX + GFS)/10
The performance at the oral exam can lead to a correction
of at most one point up or down. In principle, the two members of a
project team will receive the same mark unless there are strong
indications of differences in performance.
The course needs to be terminated within the quarter in which it
is taught. Because Project GFS was released late, some extra
time will be tolerated as follows:
The deadline for an unlimited final grade (maximum of 10)
is Monday, April 30, 2018, 10:30.
The deadline for a final grade limited to a maximum of 7
is Monday, May 7, 2018, 10:30.
The deadlines hold for sending me,
the reports of all 3 projects by e-mail. Then, I expect you to
deliver all reports in hard copy in one of the following ways:
By preference, in person to me, on Friday, May 4 or May 18,
both days between 12:00 and 13:00 in my office.
On other days of the week, in my mailbox located opposite to
my office inside the secretariat of the CAES group.
At the moment of hard copy delivery, we make an appointment for
an oral examination (most likely a Friday in May or June). If
you deliver your hard copy in my mailbox, send me an e-mail such
that I can propose a time for the examination session.
My list of books
on different aspects of VLSI design including VLSI for signal