Lead Linux Administrator (Contract)
Location: Bellevue, WA (100% On-site)
Duration: 6+ months
Role
Own the administration, performance, and automation of Linux infrastructure powering semiconductor design and HPC workloads. You’ll lead ops standards, tune clusters, and modernize with CI/CD, Git, and IaC across on-prem and Azure.
What you’ll do
- Administer, harden, and upgrade Linux (RHEL/CentOS/Alma/Rocky) across HPC and build farms.
- Operate and optimize HPC clusters and job schedulers (e.g., Slurm, PBS/LSF/SGE).
- Automate provisioning/config with Ansible (or similar) and Infrastructure-as-Code.
- Build CI/CD for infra changes; manage Git workflows and release hygiene.
- Script tooling in Python and Bash for monitoring, remediation, and capacity tasks.
- Integrate on-prem with Azure (hybrid networking, Arc, storage, images).
- Monitor/alert with Prometheus/Grafana (or equivalent); drive SLOs/RCAs.
- Lead incident response, root-cause fixes, patching, and vulnerability remediation.
- Mentor admins; document runbooks/standards.
Must-have
- 8+ years Linux systems administration; 3+ years leading/mentoring.
- Hands-on HPC operations and job schedulers (Slurm/PBS/LSF).
- Strong Python and Bash; git-based CI/CD (GitHub/GitLab/Azure DevOps).
- IaC/automation: Ansible (and/or Terraform/Cloud-Init).
- Hybrid cloud (Azure) fundamentals: VNets, ExpressRoute/S2S VPN, Arc, images.
- Storage & filesystems for HPC: NFS, Lustre/GPFS (or similar); LDAP/AD integration.
- Security hardening, patching, and compliance at scale.
Nice-to-have
- Semiconductor/EDA environments (Cadence/Synopsys/Mentor) and license servers (FlexLM).
- Containerization on HPC (Singularity/Apptainer, Docker where appropriate).
- Observability at scale; cost/perf tuning for compute and storage.