Job Description
API Production Support (Several Openings. 24x7 Production Support Team)
Location: REMOTE
Pay Rate: Open to W2 and C2C options
Position Type: Multiyear Contract
Requirements - 24x7, Level 2 API support and incident response service team
- Expertise in MuleSoft API troubleshooting and support
- Experience using monitoring tools for API management like Azure Monitor, Splunk and Dynatrace
- Familiarity with ServiceNow tools for incident tracking and documentation
- Ability to use enterprise runbooks and wiki documentation for issue resolution
- Ability to collaborate with multiple internal and external stakeholders, including the Tier 3 team and Support Lead
- Preferably a Java background to understand stack traces, logs in order to pinpoint root cause
- Experience with SOAP/REST APIs with Spring Boot and Java microservices
- Experience with MuleSoft AnyPoint Platform including Exchange and monitoring
- Use Azure, Splunk and Dynatrace-based dashboards for monitoring and resolution
- Conduct root cause analysis, escalate issues to internal Tier 3 team as necessary, and engage multiple vendors for resolution when required
- Use enterprise runbooks, wiki documentation, and collaboration with the Tier 3 team or Support Lead
- Provide 24x7 on-call support as a primary or secondary contact (rotation basis)
- Serve as API support on least one major incident call per day, averaging 2 hours
- API-related incidents through ServiceNow and based on Moogsoft tickets
- Troubleshoot and resolve issues within L2 incident criteria
- Ensure timely response and resolution of API-related incidents per agreed SLAs
- Perform initial triage, log analysis, and impact assessment
- Ensure monitoring and alerts are accurate, current, and functional
- Utilize enterprise runbooks and wiki documentation for troubleshooting and resolution
- Participate in Problem and Knowledge Management process as requested
- Observability support for incident management to proactively identify, diagnose and resolve issues
- Conduct detailed RCA (Root Cause Analysis) for recurring or high-impact incidents
- Provide RCA reports with contributing factors, corrective actions, and long-term recommendations
- Work with internal teams to implement preventative measures
- Collaborate with the Tier 3 team or support lead when necessary to resolve complex issues
- Maintain documentation of escalations, including logs, timestamps and resolution progress
- After RCA, determine and contact relevant vendors required for issue resolution
- Provide necessary logs, issue descriptions, and troubleshooting details to vendors
Track vendor resolution progress, coordinate efforts, and update stakeholders Crital, No n-Critical
Ref: #850-Rockville (ALTA IT)
Job Tags
Contract work, Remote job,