We're Hiring

Build the Future of
Observability

Join a team that's redefining how companies monitor and optimize their applications and AI infrastructure.

Why Join obsdeck

We're building something meaningful with a passionate team

Cutting-Edge Technology

Work with the latest AI, ML, and observability technologies at scale.

Strong Team Culture

Collaborative environment where your ideas and contributions matter.

Competitive Benefits

Comprehensive health benefits, equity, flexible work, and more.

Open Positions

Join our founding team and shape the future of observability

DevOps Engineer

Full-time Remote San Francisco, CA

Build and maintain the infrastructure that powers our AI-native observability platform. You'll work on scalable cloud infrastructure, CI/CD pipelines, and deployment automation to support our rapidly growing platform serving mission-critical workloads.

What You'll Do

  • Design and implement scalable cloud infrastructure using Kubernetes and Terraform
  • Build and maintain CI/CD pipelines for rapid, reliable deployments
  • Develop monitoring and alerting systems for our infrastructure and services
  • Optimize cloud costs and resource utilization across our platform
  • Implement security best practices and compliance requirements
  • Collaborate with engineering teams to improve deployment processes

What We're Looking For

  • 5+ years of experience in DevOps, SRE, or infrastructure engineering
  • Strong experience with Kubernetes, Docker, and container orchestration
  • Proficiency with cloud services and infrastructure management
  • Infrastructure as Code experience (Terraform, CloudFormation)
  • Experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD)
  • Strong scripting skills (Python, Bash, Go)
  • Understanding of networking, security, and observability concepts

Bonus Points

  • Experience with monitoring and observability tools
  • Background in SRE or platform engineering
  • Experience scaling infrastructure for high-throughput data pipelines
  • Contributions to open-source infrastructure projects

Full Stack Engineer

Full-time Remote San Francisco, CA

Design and build user interfaces and APIs that help engineering teams monitor their applications and AI models. You'll work across the entire stack, from React frontends to Node.js/Python backends, creating intuitive experiences for complex observability workflows.

What You'll Do

  • Build responsive, performant web applications using React and TypeScript
  • Design and implement RESTful and GraphQL APIs for data visualization
  • Create data visualization components for metrics, traces, and logs
  • Develop real-time dashboards and alerting interfaces
  • Work with product designers to craft intuitive user experiences
  • Optimize application performance and database queries
  • Write comprehensive tests and maintain high code quality

What We're Looking For

  • 4+ years of full-stack development experience
  • Strong proficiency in React, TypeScript, and modern JavaScript
  • Experience with backend frameworks (Node.js, Express, or Python/FastAPI)
  • Solid understanding of RESTful API design and GraphQL
  • Experience with relational and NoSQL databases (PostgreSQL, MongoDB)
  • Knowledge of state management (Redux, Zustand, or similar)
  • Strong CSS skills and experience with design systems

Bonus Points

  • Experience building data visualization tools (D3.js, Recharts, Plotly)
  • Background in observability or monitoring tools
  • Experience with real-time data streaming (WebSockets, Server-Sent Events)
  • Knowledge of performance optimization and profiling
  • Open-source contributions to developer tools

Data Scientist

Full-time Remote San Francisco, CA

Develop machine learning models for anomaly detection, predictive analytics, and root cause analysis. You'll work with large-scale time-series data and build the algorithms that power intelligent monitoring for applications and AI systems.

What You'll Do

  • Design and implement anomaly detection algorithms for time-series metrics
  • Build predictive models to forecast system behavior and prevent outages
  • Develop root cause analysis systems using correlation and causal inference
  • Create automated baseline learning systems for dynamic thresholds
  • Work with streaming data pipelines to enable real-time ML inference
  • Collaborate with engineers to deploy models into production
  • Analyze and interpret complex patterns in application and infrastructure metrics

What We're Looking For

  • MS or PhD in Computer Science, Statistics, Mathematics, or related field
  • 4+ years of experience in data science or machine learning
  • Strong expertise in time-series analysis and forecasting methods
  • Proficiency with Python and ML frameworks (TensorFlow, PyTorch, scikit-learn)
  • Experience with anomaly detection techniques (statistical, ML-based)
  • Understanding of distributed systems and their failure modes
  • Strong foundation in statistics and probability theory

Bonus Points

  • Experience with AIOps or observability platforms
  • Background in causal inference or Bayesian methods
  • Experience with streaming ML (Apache Flink, Spark Streaming)
  • Knowledge of AutoML and automated feature engineering
  • Publications in relevant ML conferences

LLM Research Engineer

Full-time Remote San Francisco, CA

Research and develop novel approaches to monitoring and optimizing large language models in production. You'll work on cutting-edge problems like LLM performance profiling, inference optimization, and intelligent prompt analysis for enterprise AI applications.

What You'll Do

  • Research and develop novel methods for LLM observability and monitoring
  • Design systems to track and analyze LLM inference performance, latency, and quality
  • Build automated evaluation frameworks for prompt engineering and model outputs
  • Develop techniques for detecting hallucinations, biases, and degradation in LLM responses
  • Create optimization strategies for LLM serving and resource utilization
  • Analyze failure modes in production LLM deployments
  • Publish research findings and contribute to the ML community

What We're Looking For

  • PhD or MS in Computer Science, Machine Learning, or related field
  • 3+ years of hands-on experience with large language models
  • Deep understanding of transformer architectures and attention mechanisms
  • Experience with LLM frameworks (Hugging Face, LangChain, LlamaIndex)
  • Strong programming skills in Python and PyTorch
  • Track record of research publications or significant contributions to ML projects
  • Experience with model evaluation metrics and benchmarking

Bonus Points

  • Experience with LLM fine-tuning and RLHF
  • Knowledge of model compression and quantization techniques
  • Background in production ML systems and MLOps
  • Experience with vision-language models (VLMs)
  • Contributions to major LLM open-source projects
  • Publications at top-tier ML conferences (NeurIPS, ICML, ICLR)

Data Engineer

Full-time Remote San Francisco, CA

Build and scale the data infrastructure that powers our observability platform. You'll design and implement large-scale data pipelines processing billions of events per day, working with modern data stack tools to enable real-time and batch analytics for applications and AI models.

What You'll Do

  • Design and build scalable data pipelines for metrics, logs, and traces ingestion
  • Develop ETL/ELT workflows using Airflow to process terabytes of observability data
  • Implement data models and schemas in Snowflake for efficient querying and analytics
  • Build real-time streaming pipelines using Kafka, Flink, or Spark Streaming
  • Optimize data storage and query performance for time-series and event data
  • Implement data quality monitoring and validation frameworks
  • Work with data scientists to enable ML feature engineering at scale
  • Build data APIs and services for internal teams and customer integrations

What We're Looking For

  • 5+ years of experience in data engineering or similar roles
  • Strong expertise with Snowflake, BigQuery, or other cloud data warehouses
  • Hands-on experience with Airflow (or similar orchestration tools like Dagster, Prefect)
  • Proficiency with streaming platforms (Kafka, Kinesis, Pulsar)
  • Experience with distributed processing frameworks (Spark, Flink, Beam)
  • Strong SQL skills and data modeling expertise
  • Proficiency in Python and familiarity with data frameworks (pandas, dask, polars)
  • Understanding of data governance, security, and compliance best practices

Bonus Points

  • Experience with observability or monitoring platforms
  • Knowledge of dbt for data transformation and modeling
  • Experience with columnar storage formats (Parquet, ORC, Iceberg)
  • Familiarity with data catalog tools (DataHub, Amundsen)
  • Understanding of cost optimization in cloud data platforms
  • Background in real-time analytics and event-driven architectures