Sequential circuits typically account for approximately 20%–40% of the total circuit elements in a general-purpose digital integrated circuit, although the exact percentage may vary depending on the intended function of the chip and specific design requirements [1, 6,7,8]. The performance indicators of integrated circuits are defined as PPA, which refers to power consumption, speed, and chip area. Research has been conducted to improve these three performance metrics of semiconductor chips [2,3,4,5]. The set and reset functions of a D flip-flop (DFF) are very important in sequential circuits. These functions support synchronization and timing coordination between clock domains. They also contribute to efficient power management by facilitating proper power-on/off sequences and selective activation, thereby reducing power consumption. The DFF cell circuits generally use the tri-state gate (TSG) method and the pass-transistor gate (PTG) method in the standard CMOS library [1]. The TSG and PTG operate as switches according to the clock signal. Therefore, clock and clock bar signals are required. In this article, we define a DFF cell with set/reset signals as DFFSR. Compared with DFF cells, the DFFSR cell requires additional NAND and OR gates to implement the reset/set function. While a basic DFF uses minimal NAND gates for its operation, adding reset/set functionality makes the circuit more complex, leading to a larger cell area. Therefore, DFFSR occupies more space than a standard DFF. For example, when the PTG circuit structure is applied in the 180 nm standard CMOS process, the area of the DFFSR is known to increase by 1.53 times compared with that of the DFF. In other words, it can be seen that the layout area increases by 53% compared with that of the conventional DFF. This has a significant impact on the increase in the sequential circuit area of the chip and places restrictions on the logic synthesis of the DFFSR circuit in the RTL design flow. Previous studies have shown results for reducing the area of DFF, but the research on DFFSR is relatively limited [6,7,8]. One of the studies for reducing the area of DFF cells is the gate diffusion input (GDI) technique [1, 2, 9,10,11,12,13,14,15,16,17,18,19]. In this paper, we propose an efficient DFFSR with a small area by applying the GDI technique.
The paper is organized as follows. Section II introduces the characteristics of GDI. Section III proposes a new structure of DFFSR circuit applying the GDI technique, applies it to a sample RTL circuit to synthesize the logic circuit, and evaluates the performance based on the results reported by Synopsys Design Compiler (DC).
The basic cell of GDI is shown in Figure 1. GDI uses the gate and source terminals of the CMOS inverter as input signals [9,10,11,12,13,14,15,16,17,18,19]. As shown in Table 1, various logic circuit operations can be performed using the same circuit, depending on the input conditions. Figure 2 shows the GDI circuit implementation of AND, OR, and MUX, as presented in Table 1. It can be seen that similar operations can be implemented with a smaller number of MOS transistors compared with conventional CMOS techniques. Therefore, a small chip area and, in some cases, low power and high-speed operations are possible.

Basic GDI cell and its function. GDI, gate diffusion input.

Functional block diagram of fingerprint sensor chip. (A) AND (B) OR (C) MUX.
Functions of GDI technique.
| Input | Output | Function | ||
|---|---|---|---|---|
| G | P | N | ||
| A | B | ‘1’ | A + B | OR |
| A | ‘0’ | B | A·B | AND |
| A | B | C |
| MUX |
| A | ‘1’ | ‘0’ | NOT | |
GDI, gate diffusion input.
MOS transistors have threshold voltages, so NMOS has problems with VDD transfer, and PMOS has problems with 0 V transfer. Therefore, the output voltage of the GDI cell can be 0, VTN, VDD-VTP, or VDD. In other words, an intermediate voltage can appear [1, 2]. Here, VTN represents the threshold voltage of the NMOS, and VTP represents the threshold voltage of the PMOS. Output voltage distortion occurs due to the threshold voltage drop of the MOS transistor of the GDI cell. For this reason, the cascade GDI cell structure has the disadvantage that functional errors can occur [2]. The cascade design is defined as a series-connected circuit in which the output of one GDI cell is connected to the input of the next GDI cell. The distortion of the output voltage is a major drawback of GDI technology, which limits the wide application of the GDI technique in chip development. Ultimately, designing a chip composed solely of GDI cells is not feasible. The most effective way to overcome these problems is to ensure that GDI cells are not placed consecutively during the logic circuit synthesis in the RTL design, but are instead mixed with CMOS cells.
Figure 3 shows a hybrid GDI structure that solves the operational errors and enables GDI low-power, small-area chip design using GDI technology [2]. The cascade circuit is a mixture of CMOS and GDI cells in the hybrid GDI technology, as shown in Figure 3. Since the designed chip mixes CMOS and GDI cells, the area reduction may be reduced compared with a chip designed only with GDI cells. In addition, because the placement and routing (P&R) process must apply an EDA environment that includes a logic synthesizer such as openROAD, cadence INNOVUS, and Synopsys Fusion compiler, a new library with characteristics for GDI cells must be provided in addition to the CMOS library.

Hybrid GDI technique. GDI, gate diffusion input.
In this paper, a set/reset DFF cell designed only with traditional CMOS gates is defined as DFFSR_C, and a set/reset DFF cell with GDI cells is defined as DFFSR_G. Figure 4 shows typical PTG CMOS DFF cell with master/slave stage and set/reset signal. A typical DFFSR_C uses a minimum of 42 MOS transistors in total. The master stage uses 16 MOS transistors, and the slave stage uses 20 MOS transistors. Table 2 shows its operational truth table. RN represents the negative reset signal, SN represents the negative set signal, and D and CK signals represent data and clock, respectively. DFFSR_C consists of four PTGs and two OR and NAND cells each in addition to four inverter cells. The RN and SN signals are asynchronous input signals that can occur independently of CK.

Conventional CMOS set/reset D flip-flop (DFFSR_C).
Truth table of DFFSR operation.
| RN | SN | D | CK | Q[n+1] | QN[n+1] |
|---|---|---|---|---|---|
| 0 | 0 | X | X | 0 | 1 |
| 1 | 0 | X | X | 1 | 0 |
| 0 | 0 | X | X | 1 | 0 |
| 1 | 1 | 0 | ↑ | 0 | 1 |
| 1 | 1 | 1 | ↑ | 1 | 0 |
| 1 | 1 | X | ↓ | Q[n] | QN[n] |
Figure 5 shows a novel PTG D flip-flop cell (DFFSR_G) with master/slave stage and set/reset signal based on the GDI technique. The proposed DFFSR_G circuit uses two GDI MUX circuits and two GDI OR circuits, as shown in the blue dotted boxes. The remaining NOT series cells use CMOS cells. The PTG circuit applied in DFFSR_C is replaced with GDI MUX circuits, and the CMOS OR cell is replaced with GDI OR circuits. Conventional DFFSR_C requires four CLK and CLKB signals each, but DFFSR_G requires only two CLK input signals; CLKB is not required in DFFSR_G. Therefore, an added CLKB circuit is not required. In this case, the number of transistors can be dramatically reduced overall. DFFSR_G can overcome the drawback by avoiding the cascaded GDI circuit structure mentioned in Section II. This circuit demonstrates a hybrid GDI structure.

Proposed GDI set/reset D flip-flop (DFFSR_G). CK, DFFSR_G, GDI, gate diffusion input.
The proposed DFFSR_G uses just 28 MOS transistors. The GDI OR gate reduces the area of the CMOS OR and PTG cells while simultaneously avoiding the cascaded GDI structure. Because GDI gate outputs may not fully swing, static current may occur in subsequent CMOS stages. The static current may occur in the inverter and NAND cell connected to the next stage of the GDI cell, as shown in the DFFSR_G circuit of Figure 5.
There are various ways to reduce the static current, but in this study, the width and length of the MOS transistors of the inverter and NAND cells connected to the GDI cell were optimized while maintaining the circuit structure. In this case, the current is reduced, however the driving strength of the CMOS cell is reduced, which may increase the delay time and affect the operating speed. When applied to DFFSR_G, it is necessary to adjust the power consumption and the operating speed.
The characteristics of the DFFSR_G cell are extracted and compared with the DFFSR_C. The simulations were performed by HSPICE and Spectre with 180n standard CMOS process parameters at 1.8 V operating voltage. The MOS length for PMOS is 180 nm, the width is 900 nm, and for NMOS it is 600 nm, which is the same as in the rest of the CMOS circuit, except for the GDI cell, which is adjusted for optimization. The operating frequency is 100 MHz for power calculation. Figure 6 shows the operation simulation results of DFFSR_G. When the SN signal is logic 0 and the RN signal is logic 1, the output Q performs a “set” operation to 1, and when SN and RN have opposite conditions, Q performs a “reset” operation to 0. Spectre simulation results show that the proposed DFFSR_G has the same functionality as DFFSR_C. Table 3 summarizes of the performance comparison. This result means that DFFSR_G has an area reduction effect of about 33% compared with DFFSR_C, which uses 42 MOS, and is expected to contribute to reducing the sequential circuit area. However, it is disadvantageous compared with DFFSR_C in terms of operation speed. The proposed DFFSR_G cell can be applied to the design of small chip area, low-power chips suitable for mobile devices. The direction of future research is to design to achieve improvements at or above the level of CMOS through analysis of operating speed.

Functional simulation result of DFFSR_G.
Summary of DFFSR performance comparison.
| Number of MOS TR. | clk_q_tplh (sec) | clk_q_tphl (sec) | average (sec) | power_avg (uW/MHz) | |
|---|---|---|---|---|---|
| DFFSR_C | 42 | 3.60E-10 | 3.17E-10 | 3.39E-10 | 0.211 |
| DFFSR_G | 28 | 5.97E-10 | 6.04E-10 | 6.00E-10 | 0.205 |
| Improvement | 33% | –77% | 3% |
As a sample design to verify the applicability of the RTL design flow and the performance of the DFFSR_G cell, a 64-bit counter with set/reset operations was designed, and layout was performed through P&R. The 64-bit counter was developed using the Verilog-HDL language, and the logic circuit was synthesized using Synopsys DC. The open EDA tool “openROAD” was used for P&R. Figure 7 shows the RTL code of the 64-bit counter.

Verilog-HDL based RTL code for synthesis and P&R verification using DFFSR_G. P&R, placement and routing.
The timing and power values of the remaining GDI cells, including DFFSR_G, were extracted using HSPICE. The 180n CMOS process parameters were used for logic circuit synthesis and P&R, and implemented using the Design Kit (D/K). The characteristic values of DFFSR_G cells were added to the Synopsys DC library. As shown in Figure 8, the DFFSR_G library file was generated by modifying the values of each timing field, area, and power field based on the characterization results, using the conventional library file. The library file of the text format was converted to a binary “db” file using the “write_ lib” command of DC and applied.

Configuration condition for openROAD.
The 64-bit counter layout was placed and routed using the openROAD tool. The process is the “gf180” platform of the 180n process 6-metal. Figure 9 show the design constraint specification files for openROAD, where the core utilization was applied to 65%, and the operating frequency was 100 MHz with 10 ns period. The SDC file shown in Figure 9 is compatible with Synopsys ICC I, II, or Fusion Compiler. Figure 10 shows the 64-bit counter layout applying the DFFSR_G, and the layout was performed by the openROAD tool. Table 4 shows the circuit synthesis and P&R results. The number of synthesized logic circuit cells is 196, and the chip size was measured as 8,618.4 μm2. The P&R performance results showed an area reduction effect of about 22% in the sequential circuit. Since it includes the area according to routing and placement, it seems to have decreased compared with the 33% cell area reduction. It is expected that the area effect will show up to 33%, depending on the sample chip. In addition, the 64-bit counter is a sample for comparing only the results of the sequential circuit area. It is somewhat limited in verifying the final result as a result of applying a circuit with a relatively small number of cell instances. Therefore, if DFFSR_G is applied to implement a large-scale integrated circuit, it is expected to be effective in designing a chip with similar power consumption and a small area.

SDC condition for openROAD.

Layout result of 64 bit counter using DFFSR_G.
Summary of 64 bit counter P&R.
| DFFSR_C | DFFSR_G | |
|---|---|---|
| Number of I/O ports | 131 | |
| Number of nets | 390 | |
| Number of cells | 196 | |
| Combinational area (µm2) | 4301.0 | 4301.0 |
| Noncombinational area (µm2) | 5535.1 | 4317.4 |
| Total cell area (µm2) | 9836.2 | 8618.4 |
P&R, placement and routing.
We proposed the DFFSR_G cell as a novel DFF with set/reset function using the GDI technique. DFFSR_G is the cell that solves the problems of cascaded GDI and applies the advantages of GDI. The layout area of this cell is approximately 33% smaller than that of a CMOS cell with the same operation, so it is expected to significantly reduce the area occupied by the sequential circuit in the full chip. As a sample design to verify the applicability of the RTL design flow and the performance of the DFFSR_G cell, a 64-bit counter with set/reset operations was designed, and layout was performed through P&R. The number of synthesized logic circuit cells is 196, and the chip size was measured as 8,618.4 μm2. The P&R performance results show an area reduction effect of about 22% based on the sequential circuit. If DFFSR_G is applied to implement a large-scale integrated circuit, it is expected to be effective in designing a chip with similar power consumption and a small area. Future work will focus on improving operating speed.
