



# Heterogeneous 3D Chiplet Integration for AI application

# Mitsu Koyanagi

Global INTegration Initiative (GINTI) Tohoku University, Japan <u>Tohoku-MicroTech. Co., Ltd, Japan</u>

### <u>Outline</u>

- Introduction
- ➢ 3D Chiplet Integration
- ➢ 3D Heterogeneous Integration
- Neuro/AI System by 3D Integration Technology
- Conclusions

#### First Proposal of 3D Integration Technology in Tohoku University

The 1<sup>st</sup> proposal of 3D integration using wafer bonding in 1989



M. Koyanagi, Proc. 8th Symposium on Future Electron Devices, pp.50-60 (Oct. 1989)

MOSFET

MOSFET Si substrate

Si

.....

MOSFET

1000

SiO<sub>2</sub>

Si



# 3D Chiplet Integration in Tohoku University



M. Koyanagi, Stanford University Workshop (CIS Round Table) (2005)

## First 3D-IC Test Chips with TSVs Fabricated in Tohoku Univ.



# Practical Implementation of 3D-IC Chips with TSVs in Semiconductor Manufacturers



# **Chiplet Integration**

#### UCIe (Universal Chiplet Interconnect Express)

Disintegration SOC first and integration with best/optimized node



Monolithic chip

#### IBM z14 Processor

14nm technology node 6.1B transistors Chip size: 696 mm2 (ISSCC 2018)



Source: S-H You (IEDM2020 Short Course)

#### **Chiplets & Active Interposer : Concepts**

- Chiplet for:
  - Lower cost
  - Higher modularity
  - From IP-reuse to circuit-reuse
- Active Interposer,

the « Smart Hub » for :

- Scalable System Interconnects
- PHYs for off-chip communication
- Power Management
- DFT, thermal, etc

In a mature CMOS technology (cost-performance trade off)

- Using passive interposers
  - (2.5D) or organic substrate:
- But limitations regarding
  - Chiplet connectivity (scalability),
  - Less scalable function (heterogeneity)



D. Dutoit et al. (CEA-LETI), IEDM2020

#### AMD Instinct<sup>™</sup> MI300A Modular Chiplet Package



#### Alan Smith et. al.. IEEE ISSCC, pp.208-209 (2024)

## Direct D2W Hybrid Bonding and Collective D2W Hybrid Bonding (Reconfigured W2W Hybrid Bonding)





# Process Flow of D2W Hybrid Bonding



#### S. Lee et al., IEEE ECTC, pp.1085-1089 (2022)

# Hybrid Bonding in Tohoku University



IEEE ECTC (2022)

# Cu Grain Morphology \_ SEM



Grain size ~2μm (tiny and extremely random oriented) Grain size >10µm (very large and relatively oriented)

# Cu Grain Crystallographic Orientation \_ EBSD



M. Murugesan, M. Koyanagi, T. Fukushima, IEEE ECTC (2022)

## 3D Chiplet Integration on Wafer by D2W Hybrid Bonding



HBM with 8 memory layers Jaesik Lee, IEEE IEDM, SC2.2 (2023)

# **Pick-and-Place vs Self-Assembly**





Simultaneous Bonding of Many Dies with Different Size by Self-Assembly



# High-Speed Water Droplet Spray for Self-Assembly



# Water Droplets Supplied on Small Hydrophilic Area



# Photo of Various-size Self-Assembled Chips on 8-inch Wafer Prepared by Hybrid Self-Assembly



## Photo of µLED Array Prepared by Hybrid Self-Assembly





Self-Assembled Micro-LED chips (75umx125um)

## Combined Process Sequence of Self-Assembly and Hybrid Bonding (SA-Hybrid Bonding)



# **Results of SA-Hybrid Bonding**

1





- Using this method, we have currently achieved an assembly of 6 layers. In the future, we will continue to explore the best conditions to achieve a structure of more than 12 layers.
- The current average assembly accuracy has reached a level of less than 500um, but the data is still scattered, and efforts will be made to reduce the data scatter in the future.
- The current thermal bonding can still see the bonding interface between Cu. In the future, we will continue to explore the impact of liquid on bonding and the optimal bonding conditions.

Heterogeneous Integration

# Device Level

# Architecture/System Level

## **Device Level Heterogeneous Integration**

### 3D Heterogeneous Integration Technology in Tohoku Univ.



T. Fukushima, M. Koyanagi et. Al., IEEE IEDM, p.359 (2005) K-W Lee, M. Koyanagi et. al., IEEE IEDM, p.531 (2009)

# New Reconfigured Wafer-to-Wafer 3D Integration



T. Fukushima and M. Koyanagi et al., IEDM, p.439 (2007)

# Various Kinds of Heterogeneous Integration

- Heterogeneous Integration with Non-Si devices
- Heterogeneous Integration with Sensor/MEMS
- Heterogeneous Integration with Photonics/Optics
- Heterogeneous Integration with Bionics
- etc.

## Heterogeneous Integration with Compound Semiconductor Chip (DAHI)



#### **DAHI on CMOS**



#### **DAHI on SiC interposer**



#### 3D Heterogeneous Integration with MEMS Using Self-Assembly



K-W Lee, M. Koyanagi et al., 3D-IC, Sept. 28, 2009

#### 2.5D/3D Heterogeneous Integrations of CMOS, MEMS and Passive Device Chips on Si Substrate



Electronic-Photonic Systems-on-Chip for Compute, Communications and Sensing

## **45SPCLO process**



Rakowski *et al* OFC 2020

- Same transistors as in 45nm SOI
- Number of features optimized for photonics
  - Ge photodetectors, Si dopings, Si partial etch, SiN, V-groove couplers etc.

Vladimir Stojanović, IEEE ISSCC Forum 6.8 (2024)

# Future Systems-In-Package with Optical I/O



| Gen | Electrical I/F<br>(Advanced Package) |         |                |                        | Optical I/F<br>(CW-WDM) |              |                       | Optical<br>Chiplet | Off-package<br>IO BW (4-8 |
|-----|--------------------------------------|---------|----------------|------------------------|-------------------------|--------------|-----------------------|--------------------|---------------------------|
|     | I/F                                  | Modules | Tx / Rx<br>IOs | Data Rate<br>[Gbps/IO] | Ports                   | λs /<br>Port | Data Rate<br>[Gbps/λ] | (Tx+Rx)            | package)                  |
| 1   | AIB                                  | 24      | 20/20          | 2                      | 8                       | 8            | 16                    | 2 Tbps             | 8-16 Tbps                 |
| 2   | AIB                                  | 16      | 80 / 80        | 2                      | 8                       | 8            | 32                    | 4 Tbps             | 16-32 Tbps                |

- Gen 1 and Gen 2 already built and hardware validated
- 16-32 Tbps off-socket optical I/O bandwidth possible today

Vladimir Stojanović, IEEE ISSCC Forum 6.8 (2024)

<sup>(</sup>Source: Wade et al HotChips 2023)

# 3D Heterogeneous Integration with Photonics (Photonic 3D Integration)



K-W Lee, M. Koyanagi et. al., IEEE Trans. on Electron Devices, vol.58, p.748 (2011)

# Photonic 2.5D/3D Heterogeneous Integration (Optical Interposer Embedded with VCSEL/PD Chips)



A.Noriki, M. Koyanagi et. al., Jpn. Jour. Appl. Phy. Vol. 48, p.C113-1 (2009)

#### MEC System Module with Optical Interconnection by Heterogeneous Chiplet Integration



# Fabrication Flow of Optical Interconnection with Grating Coupler and Plasmon Coupler



Photo after transferring Si optical waveguide patterns onto a Si interposer wafer



#### VCSEL Chiplet Integration on Glass Interposer by Self-Assembly

#### Self-Assembly of 12-ch VCSEL Chip on Glass Interposer


### **3D** Heterogeneous Integration with Bionics (Bionic 3D Integration)

**3D-Retina Chip Implantation into Human Eye (Retinal Prothesis)** 

Chip



### EEP Waveforms with/without Flashlight



## 3D Heterogeneous Integration with Bionics (Bionic 3D Integration)

### Brain-Machine Interface (BMI) and Intelligent Si Neural Probe with Multi-electrodes and Sensors





Recording of Neuron Potential in a Brain Using Si Neural Probe

S. Kanno, T. Tanaka, M. Koyanagi et. al., Jpn. Jour. Appl. Phy., Vol. 48, p. C189-1 (2009)



Si neural probe mounted on PCB





Si neural probe with piezoresistive force sensor

## *Tohoku Univ. Intelligent Neural Probe family*

(Prof. Tetsu Tanaka)



4-shunk double-s<mark>id</mark>ed Si neural probe



*Pillar-type electrode array* (10x10)





Optical waveguide

Si neural probe with optical waveguides

Si neural probe with microfluidic channel

## Neural stimulation

|                   | Electrical | Chemical                   | Optical                    |
|-------------------|------------|----------------------------|----------------------------|
| Stimulus speed    | Fast 🙂     | Slow                       | Fast 🙂                     |
| Neural activities | Excitation | Excitation 😳<br>Inhibition | Excitation 😳<br>Inhibition |
| Cell selectivity  | No         | No                         | Yes 😳                      |

#### Optogenetics

Gene transfer of a protein molecule that responds to light of a specific wavelength enables the control of neuronal excitation and inhibition by light.

### Neural probe w/ light control func.

Optical fiber/Optical waveguide/µLED





Realization of neural stimulation with high spatial and temporal resolution and cell selectivity.

By Courtesy of Prof. Tetsu Tanaka (Tohoku University)

### 3D Heterogeneous Integration with Bionics (Bionic 3D Integration) Review Bi-directional Brain Interface from Risk Management Perspective



UF FLORIDA Opri et al, <u>Chronic embedded cortico-thalamic closed-loop deep brain stimulation</u> for the treatment of essential tremor, Science Translational Medicine, 2020

Tim Denison, IEEE ISSCC, Forum 4.1 (2024)

## Architecture/System Level Heterogeneous Integration

Examples of Heterogeneous Integration

**High-Performance Compute** 



Naffziger et al., AMD [44]



Gomes et al., Intel [45]

#### Automotive Microcontroller



Loke et al., NXP [46]

Figures not drawn to scale

Alvin Loke, IEEE ISSCC, Forum 3.1 (2024)

### Architecture/System Level Heterogeneous Integration

#### **Homogeneous Integration**

Power, performance, area and chip cost
Cross-IP data link and latency
Thermal- and IR-aware die floorplanning

#### **Heterogeneous Integration**

System cost and time-to-market
Cross-Die data link and latency
Multiphysics-aware chip integration

Lawrence Loh, IEEE ISSCC, Plenary (2020)

### Evolution of IC from Device Integration to System Integration



Kevin Zhang, IEEE ISSCC, Plenary 1.1 (2024)

## Key Technologies for AI

### **3D Integration, Chiplet Integration, Heterogeneous Integration**

### Al Driving Global Growth of 3D-IC Chiplets



Lip-Bu Tan, IEEE ISSCC, Plenary 1.4 (2024)

## LLM Computations (Training vs. Inference)



Training (model making)

- Large batch
- Compute-intensive
- Throughput-oriented HW

#### Inference (service)

- Small batch
- Memory-intensive
- Latency-oriented HW

Joo-Young Kim, IEEE ISSCC Forum 2.5 (2024)

### Growth in application complexity



S. Bianco et al., "Benchmark Analysis of Representative Deep Neural Network Architectures," IEEE Access, 2018

ISSCC 2024 Short Course

Machine learning hardware: considerations and accelerator approaches

© 2024 IEEE International Solid-State Circuits Conference

## Global Network in IoT/AI/5G Era



### **Requirements for AI System and Technologies**



### Energy efficiency challenge



https://www.top500.org/lists/green500/

Rangharajan Venkatesan, IEEE ISSCC SC1 (2024)

### Optimizing Networks (for the Edge)

Every token is compared to all other tokens to compute attention map  $\rightarrow$  Quadratic complexity



Bram Verhoef, IEEE ISSCC Forum 2.7 (2024)

### Optimizing Networks (for the Edge)

#### **Overall Goals**



Bram Verhoef, IEEE ISSCC Forum 2.7 (2024)

## Memory-Base AI Processor to Achieve High Energy Efficiency and Compactness

- CIM: use memory array as a processing unit
- PIM: use embedded logic near memory array as a processing unit
- PNM: use an additional chip for processing inside a memory package or a set



Kyomin Sohn, IEEE ISSCC Forum 2.6 (2024)

### Neuro-Centric Sensor System Project in Tohoku Univ.



National Project in Japan (NEDO) of AI Chip (2019-2020)

## Cyclic Neuro Operation in 3D Stacked AI Chip (Forward propagation/ Backward propagation)



3D Stacked AI Chip

#### Cross-sectional View of 3D Stacked AL chip with Four Stacked Layers



#### Neuron-Block Partition to Densify Sparse Synaptic Connections in CNN



Image Recognition Using 3D Stacked AI chip Paradigm Shift from CNN to ViT (Vision Transformer)



 We can significantly reduce the number of matrix product operation (MPO) in ViT.
ViT is suitable for Edge application.

| Network         | Number of<br>Matrix Product | Accuracy |
|-----------------|-----------------------------|----------|
| CNN (Optimized) | 6,212                       | 71.4%    |
| Tiny ViT-V1     | 1,169                       | 71.3%    |
| Tiny ViT-V2     | 150                         | 76.6%    |

By Courtesy of Prof. T. Okatani, Tohoku University

### Face Recognition Using 3D Stacked AI chip

Average Error=Yaw angle: 8.0 deg., Pitch angle: 8.7 deg., Roll angle: 7.6 deg.



By Courtesy of Prof. T. Okatani, Tohoku University

# Implementation of Reservoir Neural Network in 3D Stacked AI Chip with Cyclic Neuro Operation



Mapping of Reservoir Neural Network to 3D Stacked AI Chip with Cyclic Neuro Operation

 $y_2(t)$ 

 $y_2(t+1)$ 

Second layer

 $x_1(t+5)$ 

 $x_2(t +$ 

 $x_2(t+3)$ 

 $x_2(t+5)$ 

W22

First layer

 $x_1(t+4)$ 

 $x_1(t)$ 

W21

 $x_2(t)$ 

 $x_2(t+2)$ 

 $x_2(t+4)$ 

W22

 $x_1(t+2)$   $y_1(t+1)$   $x_1(t+3)$ 

 $y_1(t)$ 

 $y_1(t)$ 

 $y_2(t)$ 

W12

Configuration of Reservoir Neural Network with Simple Learning

K. Fukuda, Yoshihiko Horio et al., NOLTA, IEICE (2021)

### Voice Recognition Using 3D Stacked AI chip



### **Next Generation Al**

#### LLM and LMM in Mobile Devices



Bor-Sung Liang, IEEE ISSCC Forum 2.8 (2024)

#### LMM AI Chip Project in Tohoku University (AI-Sensor Fusion)



Cross-sectional Structure of Sensor Integrated AI System Module

### AI Chip Based on Memory-in-Computing (CIM) with SRAM and Transformer Algorithm



#### LMM AI Chip Project in Tohoku University (AI–Sensor Fusion) Energy Efficient Neuro Processing and Reduced Data Transfer by Integrating Sensor Devices with Feature Extraction Function



Voice Signal Processing Chip

Heterogeneous Chiplet Integration/AI Need Various Kind of Knowledges and Skills



### 12-inch 3D Production Line in Tohoku Univ. GINTI (Global INTegration Initiative)



## International Cooperation IN GINTI



### **Conclusions**

- 3D heterogeneous integration and chiplet integration are the key for future intelligent systems such as HPC, AI/ML, post-5G/6G systems and quantum computer systems.
- SD heterogeneous integration and chiplet integration need various kinds of knowledges and skills. Therefore International collaborations are indispensable.
