Hybrid
•
San Francisco, US
•
DevOps Engineer
Join a dedicated, experienced team developing the fastest on-device AI inference engine, powering real-world applications.



TaskForge is hiring a DevOps Engineer to help us build and operate the cloud infrastructure that powers our platform at scale. You will be responsible for reliability, deployment velocity, and the systems that let our engineering team ship with confidence.
You will own the infrastructure layer at TaskForge — from our Kubernetes clusters and CI/CD pipelines to our monitoring stack and disaster recovery runbooks. This is a high-ownership role where you will be embedded with the engineering team, influencing how we deploy, observe, and scale our services. You care deeply about reliability and developer experience in equal measure.
Design, provision, and maintain cloud infrastructure on AWS using Terraform. Manage and scale Kubernetes clusters running our production services. Build and improve CI/CD pipelines to reduce deployment friction and increase release frequency. Own the observability stack including metrics, logging, tracing, and alerting. Respond to and lead incident resolution for production issues. Enforce security best practices across infrastructure and deployment processes.
Bring experience with eBPF, service meshes, or advanced Kubernetes networking. Have implemented zero-downtime deployment strategies for stateful services. Have experience with SOC 2 or ISO 27001 compliance from an infrastructure perspective.
You will have full ownership of infrastructure decisions at a company where reliability is not an afterthought, it is a product value. You will work with a team that respects infrastructure as a craft and gives you the budget and authority to do it right. No legacy systems or technical debt from a prior era.
Our founders are seasoned entrepreneurs with a track record of building successful AI companies. With a compact team of 9 highly skilled professionals, we excel in delivering rapid solutions and tackling complex challenges.
We prioritize collaboration and continuous learning, fostering an environment where creativity thrives and groundbreaking ideas come to life.
Tasks processed through TaskForge workflows every month.
Average speed improvement in your model training pipelines.
Reliable infrastructure designed for production workloads.
Dynamic startups, developers and enterprises building with TaskForge.
Operate AI Workflows Without Limits
Streamline task execution, monitor systems actively, and protect your infrastructure with comprehensive enterprise security.
Performance Meets Reliability
TaskForge combines speed, proactive monitoring, and enterprise security to keep your AI systems running smoothly.
