Small team, deep silicon discipline.
Procunit is built by engineers with roots in physical design, VLSI, ML compilers, and production inference infrastructure. This is a focused, early-stage technical team — six people covering the full stack from RTL to ML runtime, with advisory support from senior semiconductor and infrastructure practitioners.
The people building Procunit.
Spent 2020–2023 as a datacenter silicon architecture lead responsible for GPU provisioning at production inference scale. MS in Computer Engineering from UCLA. Founded Procunit in 2023 after three years of watching stable-model workloads consume GPU capacity at 43–48% average utilization — a structural inefficiency that commodity hardware vendors have no incentive to solve.
Silicon architect with a background in RTL design and physical implementation for application-specific compute. Led digital logic design through two prior ASIC tape-outs at a compute infrastructure company. At Procunit, owns the dataflow topology specification and on-chip memory subsystem architecture for PCU-1.
Active contributor to the MLIR project and LLVM backends. Designed the graph partitioning and op-classification algorithms in Procunit's model ingest pipeline. PhD candidate in programming language theory at Caltech, research focus on compiling for heterogeneous compute topologies.
Twenty years in VLSI physical design, including tapeout contributions at two semiconductor companies across multiple process nodes. At Procunit, leads die floorplanning, power delivery network design, and physical signoff for the PCU-1 7nm-class target. Brings the practical knowledge of what costs area and what doesn't.
Specializes in inference runtime optimization and kernel dispatch scheduling. Prior roles on cloud ML infrastructure teams — not training, specifically inference serving at scale. At Procunit, owns the runtime layer that sits between the CUDA-compatible shim and the custom datapath: batch scheduling, memory prefetch coordination, and p99 latency profiling.
RTL design and synthesis specialist with a focus on memory subsystems. Designed the on-chip SRAM tile array and high-bandwidth memory controller integration in the PCU-1 reference architecture. The SRAM working-set analysis that drives Procunit's memory hierarchy sizing is largely his work. MS in Electrical Engineering from UC San Diego.
Operational support from builders who've done it before.
Former VP of Architecture at a major GPU vendor. Provides technical guidance on dataflow architecture tradeoffs and silicon design review at each milestone.
Founder of two enterprise ML infrastructure startups. Advises on enterprise sales process, evaluation design, and pricing structure for hardware companies with long procurement cycles.
30+ years in semiconductor manufacturing and EDA toolchains. Advises on foundry relationships, design rule compliance, and cost optimization at the tapeout and packaging stages.
We're growing the team.
If you have a background in silicon design, ML compilers, or production inference infrastructure — we're hiring.
See Open Roles