About Procunit

We build chips that know one model extremely well.

Procunit exists because specialized AI inference hardware has been a decades-long unfulfilled promise. GPU vendors have every incentive to sell you flexible hardware — none to sell you hardware that's optimal only for your frozen model. We're building the engineering team to finally close that gap.

2023
Founded in Los Angeles, CA
6
Engineers across silicon and ML systems
$100K
Angel-funded; evaluating first workloads
Why Procunit Exists

The inference problem that no GPU vendor will fix.

GPU vendors sell flexibility. They build architectures that handle any tensor operation, any framework, any workload that might exist. That flexibility is precisely what makes them structurally wrong for inference at scale: once your model is frozen, you're paying full TDP for a device capable of op types your graph will never execute.

Amanda Okonkwo spent 2020–2023 as a datacenter silicon architecture lead in Los Angeles, on a team responsible for provisioning GPU compute for stable production inference workloads. Quarter after quarter, the utilization reports came back in the same range: 43–48% average GPU utilization during peak serving hours. The cluster was doing real work — the GPUs simply weren't designed to stop doing everything else.

She founded Procunit in mid-2023 with a specific thesis: model-specialized dataflow ASICs are the only architecture that can actually track an inference workload's compute graph, rather than approximating it through a general-purpose matrix engine. Specialization isn't an optimization — it's the correct abstraction level for the inference problem.

Our Mission

Make production inference economically sustainable at scale.

The compute efficiency of the global AI inference stack is in a structural crisis. Data center capacity is growing at rates that are not consistent with the economic models that funded AI development. Custom silicon isn't a performance upgrade — it's a structural correction.

  • Maximize tokens per watt, not raw FLOPS
  • Treat model specialization as a first-class engineering process
  • Make ASIC development accessible to teams below hyperscale
Abstract processing mesh representing Procunit's inference architecture
Engineering Culture

What we believe about how to build hardware.

Data over benchmarks

Published peak performance numbers from GPU vendors are measured at optimal batch sizes, data types, and memory configurations that rarely match production conditions. We measure what matters in your workload, not what prints best.

The full stack, honestly

ASIC development requires expertise across ML ops, compiler design, RTL synthesis, physical design, and packaging. We don't outsource the hard parts to contractors — we build the capability in-house.

Iterating like software

Silicon iteration cycles are long by nature. We reduce this by investing heavily in RTL simulation, pre-silicon validation, and tight feedback loops with evaluation customers before tape-out.

Specialization as engineering, not brute force

Adding more SRAM or wider buses is not specialization — it's just a bigger generic chip. True specialization means understanding your model's computation graph and building the dataflow topology that matches it exactly.

Customer workloads are not experiments

Production ML teams carry organizational and financial risk on every infrastructure decision. We treat evaluation engagements as real engineering work, not sales theater.

Efficiency compounds

A 4x improvement in tokens-per-watt is not a 4x improvement in economics — it's closer to 7-9x because it also reduces cooling cost, facility capacity, and heat-related reliability degradation.

Join Us

We're hiring engineers who think in terms of systems.

If the inference bottleneck problem interests you, we'd like to talk.

View Open Roles Meet the Team