Join us!
← Back to all jobs
Data Engineer
Remote / US East Coast
About the role
As a Data Engineer at MusicInfra, you will play a foundational role in building and scaling the music industry's most comprehensive and reliable dataset of music data. You'll be responsible for designing, deploying, and maintaining data pipelines that power our MusicGraph™ — a proprietary system linking sound recordings, musical works and the consumption thereof, at scale. Your work will support mission-critical products, enabling accurate, real-time ownership insights for our clients.
You'll report directly to the CTO and work closely with engineering, product and operations teams.
Responsibilities
- Design and maintain scalable data pipelines ingesting multi-source data from Digital Service Providers, music publishers, record labels (ERN), and third-party services.
- Build and optimize high-performance, persistent datasets in PostgreSQL and OpenSearch/Elasticsearch to power matching and clustering algorithms.
- Contribute to the development of the MusicGraph by building tools that support reconciliation, deduplication, ownership inference, and data lineage tracking.
- Collaborate with fullstack engineers on the team to implement features that integrate the MusicGraph™ into our products.
- Evaluate and tune infrastructure costs and performance in AWS (S3, EC2, ECS, RDS, OpenSearch, Lambda, Glue, Athena).
Required Qualifications
- 3+ years of experience in Data Engineering, with a focus on building and managing large-scale, production-grade data systems.
- Expertise in database design, optimization, and tuning, especially with PostgreSQL.
- Experience working with search technologies such as Elasticsearch or OpenSearch, including index design, query optimization, and scaling.
- Strong hands-on experience with AWS, especially EC2, OpenSearch, ECS, Glue, and Lambda.
- Expertise creating and managing data pipelines, especially using Apache Airflow Ability to write clean, testable, and well-documented code.
- Comfort working across unstructured data, metadata standardization, recording linkage and multidimensional fuzzy matching and clustering problems.
- Proficiency in Python.
Preferred Qualifications
- Experience designing and building efficiently with AI.
- Experience building or working with rights metadata systems in media, especially in music publishing or copyright domains.
- Familiarity with music metadata standards (CWR, DDEX ERN, DSR, MWN…).
- Background in building systems involving text similarity, clustering, and graph modeling.
- Experience running cost-efficient, scalable OpenSearch clusters over 500M+ documents.
- Strong understanding of data quality, auditing, and versioning in distributed data pipelines.
- A love of music and interest in helping creators get paid fairly!
Ready to apply?
We'd love to hear from you! Send us your resume and tell us why you're interested in this role.
Contact us if you're interested!