AI Engineer – Clinical Data Science
New York, United States
Job Description
Job Description:
- We are looking for an AI Engineer to join our Data Science team, building AI-powered solutions for clinical data processing and analysis within a major pharmaceutical organization. You will design, develop and deploy generative AI systems that automate clinical reporting workflows, extract intelligence from documents, and accelerate data-driven decision making.
- This is a hands-on engineering role - you'll be writing production code, not just building prototypes.
- Generative AI & Automation.
- Develop LLM-powered automation tools for clinical reporting and document generation workflows.
- Build AI-driven code generation pipelines and quality assessment frameworks.
- Design and implement human-in-the-loop review workflows with feedback loops to continuously improve output quality.
- Research and evaluate emerging AI methods, frameworks, and techniques for specific tasks - e.g. comparing fine-tuning vs zero-shot approaches, assessing new document extraction tools, or trialling new agentic frameworks.
- Prototype and benchmark new approaches before recommending adoption.
- Stay current with a rapidly evolving field and bring new ideas to the team.
- Design and build multi-agent systems for data workflows - agents that retrieve, generate, validate, and iterate autonomously.
- Implement agent orchestration using frameworks such as Google ADK, Lang Graph, or Lang Chain.
- Deploy and manage agents on Google Vertex AI.
- Build document processing pipelines (PDFs, Word/DOCX) - extraction, parsing, table detection, structure recognition.
- Design and build RAG pipelines grounded in source documents.
- Process, extract and transform data from unstructured and semi-structured sources.
- Code Quality & Engineering Practices:
- Write clean, well-tested, maintainable Python code following SOLID principles and recognised design patterns.
- pply single responsibility, dependency inversion, and interface segregation in real codebases - not just theory.
- Write meaningful tests and maintain high standards across the team.
- Refactor and improve existing code as part of normal development workflow.
- Use AI coding tools (e.g. Gemini CLI, GitHub Copilot) as a core part of your development workflow.
- Critically review and validate AI-generated code - understanding what it produces, why, and when it's wrong.
- Write effective prompts to direct AI tools toward correct, secure, well-structured output.
- Know when to use AI and when to write code manually - judgement over speed.
- Integrate and orchestrate LLM providers available through Google Vertex AI (Gemini, etc.).
- Build internal tools and applications using Stream lit and Fast API.
- Containerize and deploy services using Docker.
- MSc in Data Science, Computer Science, Bioinformatics, or related field (or equivalent practical experience), Strong Python skills.
- Hands-on experience building RAG systems or LLM-powered applications (using LangChain, LlamaIndex, or similar frameworks).
- Experience integrating LLM APIs (Google Gemini, OpenAI, or similar) - we work primarily through Google Vertex AI.
- Working knowledge of vector databases (ChromaDB, Weaviate, Qdrant, Pinecone, or similar).
- Cloud platform experience (GCP preferred, especially Vertex AI).
- Docker and containerized deployments.
- Strong software engineering fundamentals - SOLID principles, clean code practices, design patterns, testing, version control (Git), code review.
- Comfortable using AI-assisted development tools (e.g. Gemini CLI, GitHub Copilot) - and critically evaluating what they produce.
- Strongly Preferred.
- Experience with agentic AI patterns - multi-agent orchestration, tool use, autonomous workflows (LangGraph, Google ADK, or similar).
- Document processing experience - extracting and parsing data from PDFs and Word/DOCX files programmatically.
- Understanding of LLM evaluation principles and output quality assessment (BLEU, ROUGE etc, code execution metrics, or similar).
- Data science fundamentals - Pandas, NumPy, scikit-learn, statistical analysis, data visualization.
- Prompt engineering and optimisation techniques.
- Streamlit application development.
- Clinical trials or pharmaceutical industry experience.
- Familiarity with clinical data standards.
- wareness of regulatory and data privacy requirements in life sciences.
- Terraforma or infrastructure-as-code experience.
- CI/CD pipeline design (GitHub Actions or similar).
- Neo4j, Cypher query language.
- Network for graph analytics.
- Graph-based RAG or knowledge extraction.
- Experience with LLM-driven code generation.
- LLM fine-tuning experience (e.g. LoRA, PEFT, RLHF, Vertex AI model tuning, or similar approaches).
- NLP and text processing (HuggingFace Transformers, Sentence-Transformers).
- PyTorch or TensorFlow (for custom model work if needed).
- Google ADK (Agent Development Kit) or Vertex AI Agent Builder.
- Model Context Protocol (MCP) for tool integration and interoperability.
- Frontend experience (React, TypeScript).
- FastAPI or Flask REST API development.
- PostgreSQL or similar relational databases.
- Languages: Python (primary), SQL, some TypeScript/R.
- AI/ML : Lang Chain, LlamaIndex, Lang Graph, Google ADK, MCP, Hugging Face Transformers, Sentence-Transformers, Google Gemini (via Vertex AI).
- Document Processing: PyMuPDF, python-docx, pdf plumber, OCR tools.
- Data: Pandas, NumPy, SciPy, scikit-learn, Plotly.
- Databases: Vector databases, graph databases, relational databases.
- Infrastructure: Docker, Google Cloud Platform (Vertex AI, GCS), Terraform, GitHub Actions.
- Applications: stream lit, Fast API, Flask.
- Tools: Python packaging, testing frameworks, linting, Git.
- You care about code quality - not just making things work, but making them maintainable.
- You're comfortable working across the full stack of an AI application, from data ingestion to user-facing tools.
- You can context-switch between multiple projects and work autonomously.
- You're curious about the clinical/pharmaceutical domain and motivated to learn it.
- You see AI-assisted development as a force multiplier, not a replacement for engineering judgment.
- You're a self-directed learner who researches new methods and tools, evaluates them critically, and knows when to adopt vs when to stick with what works.
Apply
Career Focus: Artificial Intelligence
This role is part of a curated selection of life science opportunities featured on Xtalks. Employers may request edits or removal here.
Similar Jobs
AI-Readiness & Data Automation Postdoctoral...
The Earth and Environmental Sciences Area at Lawrence Berkeley National ...
Senior AI Engineer
Work shouldn’t just be something we do; it should have a purpose ....
AI/Machine Learning – Co-Op
Company Overview: Dyne Therapeutics is focused on delivering functional ...
Artificial Intelligence Engineer I
Division: Pediatrics Work Arrangement: Onsite only Location: Houston, TX...