About
The company
Improbability Labs Inc. is a Canadian technology company founded in 2023. We design, build, and operate high-performance computing infrastructure, AI/ML platforms, and mission-critical software. We also consult — standing up supercomputer clusters, GPU infrastructure, ML pipelines, and production AI for organizations that need serious engineering.
We are building toward HPC-backed command and data processing systems for autonomous endpoints — drones, robots, sensors, and distributed devices. Our work spans defence, government, enterprise, and research. We go where the problem demands real compute and systems that don't break.
Founder
Rahim Khoja — 20+ years of systems engineering. Hacker, builder, and low-level systems thinker.
HPC & GPU infrastructure
- Built and operated production Slurm supercomputers for research computing organizations, including clusters supporting large-scale AI training workloads.
- Authored open-source software GPU slicing tooling — CUDA-level library interposition that enables multi-tenant GPU sharing on any NVIDIA hardware without hardware MIG. Enforced system-wide via
ld.so.preload. - Designed bare-metal provisioning pipelines that deploy CIS-hardened Slurm, Kubernetes, Ceph, and Proxmox nodes from a single image build system. PXE network boot with CI/CD and SCAP validation.
- Deployed and maintained Open OnDemand web portals for multi-cluster HPC environments — OIDC authentication, interactive compute (Jupyter, VS Code, MATLAB, RStudio), Globus data transfer, and custom dashboard applications.
AI / ML systems
- Production AI platforms on Kubernetes: multi-LLM routing, vector search (pgvector), RAG pipelines, speech-to-text, and autoscaling inference.
- Built and open-sourced a RAG chatbot for a research computing organization — multi-provider LLM backend, vector search over technical documentation, WebSocket streaming.
- GPU-accelerated analytics: ported a 130+ indicator technical analysis library to NVIDIA cuDF for order-of-magnitude speedups.
- Custom AI agents, workflow automation, and applied LLM integration for organizations adopting AI tooling.
Infrastructure at scale
- Migrated 700+ VMs across four data centres in three countries with 15 minutes of total downtime. Automated 25+ VMware cluster deployments from bare metal.
- Linux, Kubernetes, Proxmox/KVM, VMware, Xen, InfiniBand, and network engineering across ISPs, satellite imaging, geohazard monitoring, and hosting infrastructure.
Applied systems & autonomous backends
- HPC-based algorithmic trading platform with parallel backtesting, live multi-exchange execution, and automated portfolio management.
- Designing HPC-backed command and data processing backends for autonomous endpoints — real-time sensor ingestion, ML inference, and mission coordination via Slurm and Kubeflow.
How we work
- Production-grade from day one. If it can't run unattended at scale, it's not done.
- Low-level when it matters. We write C and CUDA when the problem calls for it. We don't just configure — we build.
- Open-source first. No vendor lock-in. We use the right tool, not the most expensive one.
- Working systems over slide decks. We'd rather show you a running cluster than a roadmap.