- No
Description
Job Title: GenAi Tester
Duration: 18 Month W2 Contract
Location:Jersey City, NJ
Required Pay Scale: $60-$68/hour
AI Platform Test Engineer (LLM / Automation Focus) Overview We are seeking an AI Platform Test Engineer to support Orchestra, an internal platform designed to make AI capabilities accessible across the bank. Orchestra enables secure integration with external large language model (LLM) providers (such as Azure and Google) while enforcing strict access controls based on user authorization.
This role is focused on testing both LLM-powered applications and the underlying AI infrastructure. You will play a key role in evaluating model performance, building automated testing frameworks, and improving the reliability and quality of AI-driven systems.
Key Responsibilities
- Design and implement automated testing strategies for LLM-based systems and infrastructure
- Develop and maintain test frameworks using Python (e.g., Pytest)
- Build and utilize agentic workflows to automate testing processes and improve efficiency
- Use LLMs as evaluators (“LLM-as-a-judge”) to assess model outputs and behavior
- Analyze model responses and define metrics to evaluate output quality and reliability
- Work with tools like LangChain and LangSmith to trace, inspect, and monitor model outputs
- Validate integrations with external LLM providers (e.g., Azure, Google)
- Collaborate with engineering teams to ensure secure, scalable, and high-quality AI implementations
Required Qualifications
- Strong experience with Python and automated testing frameworks (e.g., Pytest)
- Hands-on experience with LLM-based systems and prompt/output evaluation
- Experience designing or working with agentic workflows or automation frameworks
- Knowledge of SQL and NoSQL databases (e.g., MongoDB)
- Experience with modern testing tools such as Playwright
- Ability to define and measure model quality and performance in practical scenarios
Preferred Qualifications
- Experience with LangChain and/or LangSmith for LLM orchestration and observability
- Familiarity with cloud platforms such as Azure, Google Cloud Platform (GCP), or AWS
- Experience testing AI platforms or infrastructure (not just end-user applications)
- Understanding of secure API integrations and access control mechanisms
Key Challenges
- Establishing reliable methods to evaluate LLM output quality (“what is a good response?”)
- Designing scalable, automated testing strategies for non-deterministic AI systems
- Balancing model performance, accuracy, and safety within enterprise constraints
About Matlen Silver
Experience Matters. Let your experience be driven by our experience. For more than 40 years, Matlen Silver has delivered solutions for complex talent and technology needs to Fortune 500 companies and industry leaders. Led by hard work, honesty, and a trusted team of experts, we can say that Matlen Silver technology has created a solutions experience and legacy of success that is the difference in the way the world works.
Matlen Silver is an Equal Opportunity Employer and considers all applicants for all positions without regard to race, color, religion, gender, national origin, age, sexual orientation, veteran status, the presence of a non-job-related medical condition or disability, or any other legally protected status.
If you are a person with a disability needing assistance with the application or at any point in the hiring process, please contact us at email and/or phone at: [email protected] // 908-393-8600
At The Matlen Silver Group, Inc., W2 employees are eligible for the following benefits:
- Health, vision, and dental insurance (single and family coverage)
- 401(k) plan (employee contributions only)