Company
dltHub logo

dltHub

dlthub.com
Location

In office: Berlin

  • 🇩🇪 Germany
Apply

Senior Developer for Open Source Library Python


Who We Are

We are building an open source library that enables any Python user to create and deploy data loading pipelines that convert messy, unstructured data into live and regularly updated datasets. By doing so, we address a large market: The number of Python developers increases by millions every year. The vast majority just use bootcamp-level Python as a tool to solve their problem at work. Our mission is to make such people autonomous when they create and use datasets in their organizations.

We are dedicated to the open source model and its sharing culture. We strongly believe that we can build a profitable business on those principles. We want to pursue the “GitHub model”, where you offer additional services to the open source community and always keep the library part open and free.

In our work culture, we value each other’s autonomy and efficiency. We have set hours for communication and deep work. We like automation, so we automate our work before we automate the work of others.

dltHub is based in Berlin and New York City. It was founded by machine learning and data veterans. We are backed by Dig Ventures and many technical founders from companies such as Huggingface, Rasa, Instana, Miro and Matillion. We are looking for driven applicants who believe in our mission to join our early team and make an impact as we continue to prototype and build our product.


What You’ll Do

We seek a talented Python Developer to join our team and contribute to developing open source libraries for data pipelining and loading. Our users are the people that use Python in their daily jobs but typically are not software engineers. This creates a unique challenge of writing simple but powerful, clean, and intuitive code that other people could “just use” with the concept and knowledge they already have. Besides writing code, we dive deeply into PEPs, investigate which patterns are “Pythonic” and ensure our users understand the library’s interface.

The ideal candidate for this job should have a strong empathy for other developers' needs, a passion for writing clean and intuitive code, and a natural inclination toward documentation and testing.

We have been building for over a year, and parts of our code are in production with clients. Yet we are still early as we continue to prototype and add core functionality to the library. Expect to work on various tasks: data pipelining, scheduling, database interfaces, data lakes and warehouses, integrations with other open source libraries (i.e., Pandas, dbt, Huggingface, Streamlit), performance improvements, etc. On top of that, we do our own research in code generation (i.e., using ChatGPT) and various extensions for code editors that boost our users' productivity.


Your task and responsibilities

  • Contribute code to https://github.com/dlt-hub/dlt, write tests and documentation.

  • Maintain the open source project with other team members: review PRs, resolve issues, and talk with community contributors.

  • Actively participate in defining the library architecture and API - the library is our product, and we must take input from other team members and our users.

  • Help the team when we engage the open source community: for example, by providing technical help during developer workshops and bootcamps or to people or other open source projects that build pipelines with dlt.

  • Learn and apply data engineering principles for tasks requiring data loading, dataset creation, and interfacing data lakes and warehouses.


Who You Are

  • You must really like Python and be fluent in writing Python code. You must be familiar with Python typing, unit testing, and writing docstrings.

  • You should be interested in the Python ecosystem: popular libraries, tools, Python internals, PEPs, and how Python is used outside of Software Engineering.

  • You led, mentored or supported other developers to write better code or be more efficient at work

  • You have a degree in computer science, data science, data engineering or 5000 hours of practice in the field

  • You should have some experience with databases, and understand the relational data model, transactions, concurrency, etc. Knowledge of data warehouses (BigQuery/Snowflake) will help you a lot.

  • You are familiar with GitHub workflows: pull requests, code reviews, GitHub actions (or other CI/CD services)

  • You are based in Berlin and willing to work in our office on a regular basis.


Nice To Have (any or all of those)

  • Experience in Data Engineering: building data pipelines, dataset modeling, and enabling others to use the data.

  • Experience with dev ops fundamentals: CI systems like GitHub Actions, docker, Kubernetes, AWS/GCP/Digital Ocean

  • Experience with machine learning: the toolset, the workflows, and practical applications


What do we offer

  • We’re still at the beginning of our journey, and you’ll be one of our first employees, which means that you will be able to shape the company and culture with us. That’s why we’re looking for people with a strong startup attitude - we value dedication and out-of-the-box thinking.

  • We have a great headquarters in Berlin. We work both from the office and from home. We want to be flexible and efficient. It’s crucial for us to meet regularly in the office at this stage of the company where we build out the product.

  • The team’s well-being and work-life balance are fundamental to us. Therefore we offer regular subsidized team lunches, Urban Sports Club membership, and deep work/no meeting days.

  • We want everyone on the team to grow with the company. If you have some open source project or maintain something on GitHub you can get an hourly budget for that within working hours. Depending on your role and dedication, we also have an ESOP plan for employees. We provide an option to increase your ESOP if you grow with us.