Architecture

Every transistor earns its place.

Custom silicon discipline means no transistor is on-die without a measurable contribution to your model's execution graph. No general-purpose matrix engine overhead. No tensor format converters for formats your model doesn't use. No fallback execution paths for workloads that will never be scheduled.

Target Node
7nm class
Die Area
<200mm²
On-chip SRAM
Up to 6 GB
Peak TOPS
240+ (proj.)
Die Layout

Die floor plan.

Abstract representation of the PCU-1 eval die — sectors partitioned by function, sized by workload contribution. Not a production die photo.

Abstract chip die floorplan showing compute array, SRAM banks, PCIe PHY, and power domain sectors with amber boundary highlights
Compute Array
~45% die area
SRAM Banks
~35% die area
PCIe PHY
Gen 5 x16
Software Stack

From application to silicon — five layers, no gaps.

Every layer is co-designed with the hardware. The application sees a familiar inference API; the silicon executes exactly the operations specified by the frozen model graph.

Application
Your model server, unchanged — standard REST/gRPC inference API
CUDA-compatible Shim Driver
Intercepts CUDA API calls, routes to Procunit runtime without framework changes
Procunit Runtime
Execution scheduler, batch manager, memory controller, telemetry
Custom Datapath Silicon
Model-locked ASIC — dataflow topology derived from your frozen model graph
OS Kernel Driver
Linux kernel module, PCIe Gen 5 DMA, interrupt handling

Integration characteristics

PCIe interface Gen 5 x16
Form factor 1U / 2U rack
Power variants 75W / 150W
OS support Linux 5.15+
Cooling Passive / active
Integration

Standard rack. Standard OS. Non-standard performance.

Procunit hardware ships as a standard half-height, half-length PCIe card — the same form factor and power rails your infrastructure team works with today. Procunit is not a network-attached accelerator that requires dedicated fabric. It's not a compute blade that demands custom rack units. It drops into an existing server slot.

Driver installation follows the standard Linux kernel module pattern. The OS enumerates the device at PCIe initialization; Procunit's runtime layer handles all model-specific execution scheduling above the kernel boundary. Your existing monitoring stack reads IPMI and Redfish telemetry without modification.

  • CUDA-compatible shim allows PyTorch / TensorFlow models to run without framework changes
  • Monitoring via standard IPMI / Redfish interfaces for existing ops tooling
  • Driver signing compatible with UEFI Secure Boot
  • Thermal: operates within standard server rack ambient temperature range (5–40°C)
Ready to evaluate?

Start with your model graph.

Share your frozen model in a free initial consultation. NDA-first, no commitment.

Request Evaluation