Trigyn’s financial services client has an immediate need for Site Reliability Engineer in Jersey City. This is a temp to perm opportunity for the right candidate. The particulars of the opportunity are below:
Location: Must be able to work a minimum of 1 day/week onsite at any of the following locations: Chicago, IL, Dallas, TX; Jersey City, NJ; Tampa, FL.
Our client is looking to enhance their existing staff with additional Sr. Site Reliability Engineers (SRE) to help their internal team provide production support in a public cloud environment. In this role, you’ll be working with blockchain (DLT) and cloud engineers to build the platform, pipeline, and monitoring systems to ensure the application landscape is designed to take most advantage of the firm’s global cloud solution.
The ideal candidate will have strengths in:
• Deep understanding of SRE philosophy, technologies, platforms and tools, SLA management, incident resolution, and automation
• Mastery of application, data and infrastructure architecture disciplines
• Command of architecture, design and business processes
• Expertise in working in partnership with colleagues throughout the firm, and in leading collaborative teams to achieve common goals
• Hands on experience on managing operations of large-scale internet-centric production environments for application or infrastructure services serving tens to millions of end users.
• Prior experience in large scale internet companies/technologies, where uptime and continuous availability was core to the business.
• Work with the Architecture group to design reusable patterns to deploy to applications, provide governance around adoption, and influence application development teams on roadmaps and designs.
• Identify and partner with Infrastructure teams and AD teams to implement automation opportunities to drive down toil and reduce technical debt.
• Apply standards of cloud compliance to application design to achieve reliability
• Understanding of networking and cloud technologies, for example Security, Load Balancing, Network routing protocols.
• Implement SRE frameworks to support globally multi-cloud environments, and ensure the highest level of SLA through operational excellence
• Provides failure analysis / root cause analysis when required
• Provides support to develop & improve the quality of technical engineering documentation
• Provides support to drive the maturity of the software development lifecycle
• Provides quality control of engineering deliverables
• Provides technical consultation to product management
• Performs deployment, administration, management, configuration, testing, and integration tasks related to the Blockchain (DLT) platforms in cloud environment
• Helps to develop new cloud engineering strategies and implementations for the firm
• Champion a DevOps model so that services are automated and elastic across all platforms
• Helps on coaching and mentoring less experienced team members.
• Writes operation documentation and knowledge base of known issues with solutions
• Participates in 24x7 SRE on-call rotations and escalation workflows.
• Demonstrated experience as a Site Reliability Engineer
• Bachelor’s degree or equivalent experience in a software engineering discipline
• Expertise in at least one technology stack designing, coding, testing, and delivering software
• Proficiency in one or more technology domains may be a cross-domain expert able to solve complex and mission-critical problems within a business or across the firm
• Working knowledge of infrastructure components (e.g., routers, load balancers, cloud products, container systems, compute, storage, and networks)
• Excellent debugging and troubleshooting skills
• Proven experience as a software engineer, including proficiency in at least one systems programming language (Python/Go preferred)
• Understanding of key SRE concepts, such as Service Level Objectives (SLOs), Service Level Agreements (SLAs), and Service Level Indicators (SLIs)
• Understanding of observability in distributed systems
• Experience with Linux
• Experience Kubernetes and AWS; ideally IAM & VPC Networking, Prometheus and Grafana
• AWS - Sysops/Solution Architect Certification
• Prometheus / PromQL
• Kubernetes CKA/CKAD Certifications.
For immediate Response call 732-876-7626, or send your resume to: RecruiterMJ@Trigyn.com
TRIGYN TECHNOLOGIES, INC. is an EQUAL OPPORTUNITY EMPLOYER and has been in business for 30 years. TRIGYN is an ISO 9001:2015, ISO 27001:2013 (ISMS) and CMMI Level 5 certified company.