Architecture

Every transistor earns its place.

Custom silicon discipline means no transistor is on-die without a measurable contribution to your model's execution graph. No general-purpose matrix engine overhead. No tensor format converters for formats your model doesn't use. No fallback execution paths for workloads that will never be scheduled.

Target Node

7nm class

Die Area

<200mm²

On-chip SRAM

Up to 6 GB

Peak TOPS

240+ (proj.)

Die Layout

Die floor plan.

Abstract representation of the PCU-1 eval die — sectors partitioned by function, sized by workload contribution. Not a production die photo.

Compute Array

~45% die area

SRAM Banks

~35% die area

PCIe PHY

Gen 5 x16

Software Stack

From application to silicon — five layers, no gaps.

Every layer is co-designed with the hardware. The application sees a familiar inference API; the silicon executes exactly the operations specified by the frozen model graph.

Application

Your model server, unchanged — standard REST/gRPC inference API

CUDA-compatible Shim Driver

Intercepts CUDA API calls, routes to Procunit runtime without framework changes

Procunit Runtime

Execution scheduler, batch manager, memory controller, telemetry

Custom Datapath Silicon

Model-locked ASIC — dataflow topology derived from your frozen model graph

OS Kernel Driver

Linux kernel module, PCIe Gen 5 DMA, interrupt handling

Integration characteristics

PCIe interface Gen 5 x16

Form factor 1U / 2U rack

Power variants 75W / 150W

OS support Linux 5.15+

Cooling Passive / active

Integration

Standard rack. Standard OS. Non-standard performance.

Procunit hardware ships as a standard half-height, half-length PCIe card — the same form factor and power rails your infrastructure team works with today. Procunit is not a network-attached accelerator that requires dedicated fabric. It's not a compute blade that demands custom rack units. It drops into an existing server slot.

Driver installation follows the standard Linux kernel module pattern. The OS enumerates the device at PCIe initialization; Procunit's runtime layer handles all model-specific execution scheduling above the kernel boundary. Your existing monitoring stack reads IPMI and Redfish telemetry without modification.

CUDA-compatible shim allows PyTorch / TensorFlow models to run without framework changes
Monitoring via standard IPMI / Redfish interfaces for existing ops tooling
Driver signing compatible with UEFI Secure Boot
Thermal: operates within standard server rack ambient temperature range (5–40°C)

root@infra-node-07 — ~

$ lspci | grep -i procunit
03:00.0 Processing accelerators:
  Procunit Inc. PCU-1 Eval (rev 01)
$ cat /sys/bus/pci/devices/0000:03:00.0/power/runtime_status
active
$ procunit-cli status
PCU-1 [0000:03:00.0] — operational
Model: locked (gpt2-inference-v3)
TDP utilization: 62% (93W / 150W)

Ready to evaluate?

Start with your model graph.

Share your frozen model in a free initial consultation. NDA-first, no commitment.

Request Evaluation