How to increase the operating frequency of FPGA ?
For designers, we certainly hope that the operating frequency of the circuit we design (unless otherwise specified here, the operating frequency refers to the operating frequency in the FPGA chip) as high as possible. We also often hear that exchanging resources for speed and using pipelines can increase the operating frequency. This is indeed a very important method. Today we will further analyze how to increase the operating frequency of the circuit.
Let’s first analyze what affects the operating frequency of the circuit.
The operating frequency of our circuit is mainly related to the signal propagation delay and clock skew between registers and registers. If the clock takes a long line inside the FPGA, the clock skew is very small and can basically be ignored. For the sake of simplicity, we only consider the factor of signal propagation delay here.
The propagation delay of the signal includes the switching delay of the register, the wiring delay, and the delay through the combinational logic. To increase the operating frequency of the circuit, we need to optimize the delay in these three aspects so that the impact of the delay on the operating frequency of the FPGA is as small as possible.
Let’s first look at the switching delay. This delay is determined by the physical characteristics of the device. We have no way to change it, so we can only increase the operating frequency by changing the wiring method and reducing the combinational logic.
1. Reduce the delay by changing the way of wiring.
Taking altera devices as an example, we can see that there are many blocks in the timing closure floorplan in quartus. We can divide the blocks into rows and columns, and each block represents 1 LAB. There are 8 or 10 LEs in the LAB.
The relationship between their routing delays is as follows: in the same LAB (fastest) < same column or row < different rows and different columns.
We add appropriate constraints to the synthesizer (not greedy, generally it is more appropriate to add 5% margin, for example, if the circuit works at 100Mhz, then it is enough to add constraints to 105Mhz, the greedy effect is not good, and it greatly increases the synthesis Time) can arrange the related logic as close as possible when wiring, thereby reducing the delay of wiring. (Note: The implementation of constraints is not entirely through improving the layout and routing methods to increase the operating frequency, there are other improvement measures)
2. Reduce delay by reducing combinatorial logic.
We mentioned above that the operating frequency can be increased by adding constraints, but at the beginning of the design, we must not place our good wishes to increase the operating frequency on the constraints. We must avoid large combinational logic through reasonable design , so as to increase the operating frequency of the circuit, this can enhance the portability of the design, and can make our design still usable when it is transplanted to another chip of the same speed level.
We know that most FPGAs are currently based on 4-input LUTs. If the judgment condition corresponding to one output is greater than four inputs, it must be completed by cascading multiple LUTs. This will introduce a first-level combinatorial logic delay, and we need to reduce the combination Logic is nothing more than to have as few input conditions as possible, so that fewer LUTs can be cascaded, thereby reducing the delay caused by combinational logic.
The pipelining we usually hear is a method of increasing the operating frequency by cutting large combinational logic (inserting one or more levels of D flip-flops in it, thereby reducing the combinational logic between registers and registers). For example, a 32-bit counter has a very long carry chain, which will inevitably reduce the operating frequency. We can divide it into 4-bit and 8-bit counts, and trigger an 8-bit count every time the 4-bit counter counts to 15. The counter, in this way, the cutting of the counter is realized, and the working frequency is also improved.
In the state machine, it is generally necessary to move the large counter out of the state machine, because the counter is usually greater than 4 inputs. If it is used as a state transition criterion together with other conditions, it will inevitably increase. Cascading of LUTs to increase combinatorial logic. Taking a 6-input counter as an example, we originally hoped that when the counter counts to 111100, the state jumps. Now we put the counter outside the state machine, and when the counter counts to 111011, an enable signal is generated.To trigger a state transition, this reduces the combinational logic.
The above are all cases where the combinational logic can be cut through the pipeline, but in some cases it is difficult for us to cut the combinational logic. What should we do in these cases?
The state machine is such an example. We cannot add pipeline to the state decoding combination logic. If there is a state machine with dozens of states in our design, its state decoding logic will be very huge. There is no doubt that this is very likely to be the critical path in the design. So how do we do it? It’s still the old idea, reducing combinatorial logic.
We can analyze the output of the state, reclassify them, and redefine them into a group of small state machines based on this. By selecting the input (case statement) and triggering the corresponding small state machine, the large The state machine is cut into smaller state machines. In the ATA6 specification (hard disk standard), there are about 20 kinds of commands to be input, and each command corresponds to many kinds of states. It is unimaginable to use a large state machine (state set state) to do it. We You can use the case statement to decode the command and trigger the corresponding state machine. In this way, the frequency of this module can run relatively high.
The essence of increasing the operating frequency is to reduce the delay from register to register. The most effective way is to avoid large combinatorial logic, that is, try to meet the four-input condition and reduce the number of LUT cascades. We can increase the working frequency by adding constraints, flowing water, and cutting the state.
Haoxinshengic is a pprofessional FPGA and IC chip supplier in China. We have more than 15 years in this field。 If you need chips or other electronic components and other products, please contact us in time. We have an ultra-high cost performance spot chip supply and look forward to cooperating with you.
If you want to know more about FPGA or want to purchase related chip products, please contact our senior technical experts, we will answer relevant questions for you as soon as possible