Lead Cloud Engineer

Experience: 2+ years

Minimum Education Requirement: This is a professional position, and as such, we require, at minimum, a Bachelor’s degree or higher (or equivalent) in computer science, computer engineering, computer information systems, information technology, or a combination of education equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned subjects.

Salary Range: $120702.00 to $121000.00 per year

Job Description:

Cloud Network & Infrastructure Architecture

  • Design and implement enterprise-scale cloud network and infrastructure architectures on AWS, including multi-VPC topologies, subnet segmentation, route tables, IP addressing schemes, and hybrid connectivity between AWS environments and on-premises data centers:
  • Architect network segmentation models spanning production, staging, and development VPCs with controlled cross-environment traffic flows.
  • Define CIDR allocation strategies, RFC 1918 addressing schemes, and IP management policies for large-scale AWS environments.
  • Architect secure connectivity using AWS Transit Gateway, VPC Peering, Direct Connect, VPN, NAT Gateways, and Internet Gateways:
  • Design Transit Gateway route tables and attachment policies to enable centralized hub-and-spoke network topologies.
  • Configure and troubleshoot AWS Direct Connect private virtual interfaces (VIFs) and public VIFs for hybrid connectivity.
  • Implement redundant VPN configurations with BGP failover for on-premises-to-cloud connectivity resilience.
  • Define and manage routing policies, DNS configurations (Route 53), traffic flow optimization, and secure ingress/egress patterns for distributed microservices and enterprise application environments.
  • Design and implement high-availability and disaster recovery (HA/DR) architectures, including cross-region failover, active-active and active-passive redundancy models, and recovery validation testing:
  • Develop and test RTO/RPO-aligned DR runbooks for critical cloud network and infrastructure components.
  • Implement Route 53 health checks and DNS failover routing policies for automated cross-region traffic redirection.
  • Conduct comprehensive network and infrastructure capacity planning, performance analysis, latency monitoring, and throughput optimization to maintain resilient, scalable enterprise cloud operations.

Containerization & Platform Engineering

  • Deploy, manage, and optimize containerized workloads using Amazon EKS (Kubernetes), ECS, and ECR, ensuring secure service-to-service communication and network isolation:
  • Design Kubernetes network policies, namespace segmentation, and ingress/egress controller configurations to enforce zero-trust network security within containerized environments.
  • Configure EKS node groups, managed node pools, and Fargate profiles with appropriate subnet placement, security group assignments, and IAM instance profiles.
  • Implement service mesh configurations for encrypted east-west traffic between microservices.
  • Configure and manage Application Load Balancers (ALB) and Network Load Balancers (NLB) for high availability, intelligent traffic distribution, and performance optimization across distributed services:
  • Define ALB listener rules, host-based and path-based routing policies, and target group health checks.
  • Integrate NLBs with PrivateLink endpoints for secure cross-account and cross-VPC service exposure.
  • Architect and maintain serverless event-driven workflows using AWS Lambda integrated with core networking and infrastructure services including API Gateway, Step Functions, and EventBridge.

Infrastructure as Code & Automation

  • Architect and implement all infrastructure provisioning automation using Terraform, including modular design patterns, remote state management, and workspace-based environment isolation:
  • Design reusable Terraform module libraries for VPC, EKS cluster, IAM roles, security groups, and Transit Gateway configurations.
  • Implement Terraform remote state using S3 backend with DynamoDB state locking to enable team-based infrastructure development.
  • Enforce Terraform coding standards, module versioning, and change review workflows via pull request gates.
  • Develop automation scripts, CLI tooling, and operational utilities using Python and Go to reduce operational toil and improve infrastructure reliability:
  • Build Python-based scripts for automated AWS resource inventory, compliance checks, and cost anomaly detection.
  • Develop Go-based CLI tools for operational tasks including EKS cluster health checks and network connectivity validation.
  • Implement GitOps workflows integrating Terraform with GitHub Actions to enforce infrastructure change review, approval gates, and rollback capabilities.

CI/CD Pipeline Design & DevOps Practices

  • Design and maintain robust CI/CD pipelines using GitHub Actions for automated infrastructure and application deployments across AWS environments:
  • Implement multi-stage pipeline workflows with environment-specific approval gates for production deployments.
  • Configure pipeline secrets management using AWS Secrets Manager and HashiCorp Vault integrations.
  • Build reusable GitHub Actions composite actions and workflow templates for standardized deployment patterns.
  • Integrate automated testing frameworks, security scanning tools (SAST/DAST), and compliance validation checks into release workflows to enforce quality and security gates.
  • Implement deployment validation, canary release patterns, and blue-green deployment strategies to minimize downtime and reduce risk in production infrastructure changes.

Cloud Security, Governance & Compliance

  • Architect and implement AWS security controls including IAM policies, role-based access control (RBAC), Service Control Policies (SCPs), and AWS Control Tower governance guardrails:
  • Design least-privilege IAM roles for EKS workloads using IRSA (IAM Roles for Service Accounts).
  • Define SCP guardrails at the AWS Organizations level to enforce compliance across all member accounts.
  • Define and enforce network security architectures including security groups, NACLs, WAF rules, firewall policies, and encryption standards (in-transit and at-rest) across all cloud workloads.
  • Conduct compliance reviews and implement governance frameworks aligned with enterprise security standards and regulatory requirements applicable to life sciences and pharmaceutical environments.

Monitoring, Optimization & Troubleshooting

  • Design and implement centralized logging, monitoring, and alerting frameworks using AWS CloudWatch, CloudTrail, and third-party observability tools:
  • Build CloudWatch dashboards, composite alarms, and anomaly detection for VPC flow logs, network throughput, and application health metrics.
  • Configure CloudTrail for multi-account API activity logging with S3 centralized storage and SNS alerting for security-relevant events.
  • Diagnose and resolve complex network and infrastructure incidents including routing misconfigurations, DNS resolution failures, VPN/Direct Connect connectivity disruptions, latency spikes, and performance bottlenecks across hybrid and cloud environments.
  • Conduct infrastructure cost analysis and implement optimization strategies including right-sizing, reserved capacity planning, and spot instance utilization to maximize cloud spend efficiency.

Share your resumes at career@cloudbridgeusa.com