Kubernetes Platform Engineer – AI Infrastructure at Cisco | San Jose, CA

by ethan.brook News Editor

Cisco Systems is expanding its technical workforce in San Jose, California, as the networking giant accelerates its pivot toward artificial intelligence. The company is currently seeking a Kubernetes Platform Engineer – AI Infrastructure – Cisco Careers professional to design and manage the foundational environments required to run massive AI workloads at scale.

This recruitment drive comes at a critical juncture for Cisco. As the industry shifts from traditional networking to AI-driven connectivity, the company is investing heavily in the underlying compute and orchestration layers. The role focuses on the intersection of cloud-native orchestration and high-performance computing, specifically targeting the infrastructure that supports large language models (LLMs) and generative AI applications.

The position is centered in San Jose, the heart of Cisco’s global operations, where the engineer will be tasked with ensuring that AI workloads are not only deployable but scalable and resilient. By leveraging Kubernetes, Cisco aims to abstract the complexity of GPU-accelerated hardware, allowing AI researchers and developers to deploy models without managing the underlying server architecture manually.

The Technical Mandate for AI Orchestration

At the core of this role is the challenge of scaling Kubernetes to meet the unique demands of AI. Unlike standard microservices, AI workloads require specialized hardware—primarily GPUs—and high-speed interconnects to handle the massive data throughput required for model training and inference. The Kubernetes Platform Engineer will be responsible for building the “plumbing” that allows these resources to be allocated efficiently across a distributed cluster.

From Instagram — related to Kubernetes Platform Engineer, Terraform and Ansible

Key technical priorities for this infrastructure include the implementation of advanced scheduling and resource management. In a typical AI environment, “bin packing” (fitting as many tasks as possible onto a server) is less important than ensuring low latency and high availability for GPU clusters. The engineer will likely work with Kubernetes native tools and custom operators to automate the lifecycle of AI pods.

Beyond orchestration, the role emphasizes Infrastructure as Code (IaC). Cisco utilizes tools like Terraform and Ansible to ensure that their AI environments are reproducible. This prevents “configuration drift,” where different clusters evolve in ways that make software behave inconsistently across development and production environments.

Bridging the Gap Between Hardware and Software

AI infrastructure is not purely a software challenge; it is a hardware-software co-design problem. The engineer in this position must understand how Kubernetes interacts with the physical layer, including NVIDIA GPUs and the high-speed networking fabrics that Cisco itself produces. This synergy is essential for reducing the “bottleneck” effect often seen when data moves between storage and compute units during AI training.

Bridging the Gap Between Hardware and Software
Kubernetes Platform Engineer Remote Direct Memory Access

The role involves managing the complex networking requirements of AI, such as RDMA (Remote Direct Memory Access) and GPUDirect, which allow GPUs to communicate across a network without involving the CPU. Implementing these within a Kubernetes framework requires a deep understanding of Container Network Interface (CNI) plugins and the underlying Linux kernel.

Core Requirements for Cisco AI Infrastructure Engineers
Competency Area Key Technology / Skill Primary Objective
Orchestration Kubernetes (K8s) Automated scaling and deployment of AI pods
Infrastructure Terraform / Ansible Version-controlled environment provisioning
Compute NVIDIA GPUs / CUDA Optimizing hardware acceleration for LLMs
Languages Go / Python Developing custom operators and automation scripts

Cisco’s Strategic Shift Toward AI-Ready Networking

This hiring push is a reflection of a broader corporate strategy led by CEO Chuck Robbins. Cisco has been aggressively repositioning itself to be the “backbone” of the AI era. A primary example of this shift is the Cisco newsroom‘s frequent updates regarding the integration of AI into security and observability tools, as well as the strategic acquisition of Splunk to enhance AI-driven data analytics.

By building a robust Kubernetes-based AI platform internally, Cisco can better develop the products it sells to other enterprises. If Cisco can solve the problem of scaling AI infrastructure for its own internal teams, it can translate those learnings into managed services and hardware optimizations for its global customer base.

The San Jose location remains the epicenter of this innovation. While Cisco has embraced hybrid work, the proximity to Silicon Valley’s hardware ecosystem—including partners like NVIDIA and Intel—makes the San Jose hub essential for the rapid prototyping of AI-ready data centers.

Who is Impacted by This Infrastructure?

The work performed by the Kubernetes Platform Engineer directly affects several internal and external stakeholders:

AWS re:Invent 2019: Kubernetes clusters u0026 on AWS with Cisco Container Platform (CON211-S)
  • AI Researchers: Who gain the ability to spin up massive compute clusters in minutes rather than weeks.
  • Product Engineers: Who can integrate AI capabilities into Cisco’s networking software without worrying about the underlying server stability.
  • Enterprise Customers: Who will eventually see the fruits of this infrastructure in the form of more intelligent, self-healing networks.
  • The San Jose Tech Economy: As Cisco continues to hire for high-specialization roles, it reinforces the region’s status as the primary hub for AI infrastructure development.

Navigating the Application Process

For candidates looking at Kubernetes Platform Engineer – AI Infrastructure – Cisco Careers, the selection process typically emphasizes a combination of theoretical knowledge and hands-on experience. Cisco generally looks for engineers who have managed Kubernetes clusters at a scale of hundreds or thousands of nodes, as the challenges of “minor” clusters do not apply to AI-scale infrastructure.

Prospective applicants are encouraged to demonstrate their familiarity with the CNCF (Cloud Native Computing Foundation) ecosystem. Experience with tools like Prometheus for monitoring and Grafana for visualization is often a prerequisite, as the ability to observe GPU utilization in real-time is critical for preventing costly resource waste in AI clusters.

Official applications and detailed job specifications are managed through the Cisco Careers portal, which serves as the primary source for verified role requirements and benefit disclosures.

As Cisco continues to integrate AI across its portfolio, the next confirmed checkpoint for the company’s infrastructure strategy will be its upcoming quarterly earnings reports and product announcements, where the company typically outlines its progress in AI adoption and workforce expansion.

Do you have experience scaling Kubernetes for AI? Share your thoughts or questions about the current state of AI infrastructure in the comments below.

You may also like

Leave a Comment