The most common way of using Docker is by setting several instructions in a text file called Dockerfile. Another Docker client is Docker Compose, which lets you work with applications consisting of a set of containers.” (text extracted from the Official Docker Website).ĭocker architecture (from the Official Docker website) The Docker client and daemon communicate using a REST API, over UNIX sockets or a network interface. The Docker client and daemon can run on the same system, or you can connect a Docker client to a remote Docker daemon. “ The Docker client talks to the Docker daemon, which does the heavy lifting of building, running, and distributing your Docker containers. ![]() The way Docker works is very simple because it uses a client-server architecture. This tool uses OS-level virtualization that allows fantastic customization of software in containers that can be easily shared with your colleagues or between development environments, “ and be sure that everyone you share with gets the same container that works in the same way” (by the Official Docker website). One year ago I started working on Docker, a fantastic set of Platform As A Software (PAAS) products, and I’m now a huge fan. Even being all develop with code it has a fantastic web interface allowing the correct understatement of the flow.It has several built-in operators, but you can write your own custom operator if they don’t fulfill your requirements.Python, a well-known programming language, is used throughout.It is more maintainable, versionable, testable, and collaborative because it was all developed in code, according to the official Airflow website.In addition to being open-source, Airflow has the following main advantages: Tasks do not move data from one to the other (though tasks can exchange metadata!)”.Īirbnb developed Airflow in 2014, it was made available as a free tool in 2015, and it was donated to the Apache Foundation in the following year. ![]() It connects and organizes tasks that manage data, and is not a data streaming tool as mentioned on the official Airflow website: “ Airflow is not a data streaming solution. This software is an open-source data orchestrator tool allowing to build full end-to-end pipelines by connecting several processes in Directed Acyclic Graphs ( DAGs). Introduction to Airflow and Docker 1.1 Apache AirflowĪpache Airflow is one of the most known tools in the data engineering world therefore I will not take long to explain it. For more information about the full version, I advise you to see the Data Engineering Zoomcamp mentioned above and this article (in Portuguese) by Leandro Bueno. The “full” proposed version to run Airflow inside a Docker container is highly resource-intensive, and hence pushes a lot of one computer/laptop (the cooling fan of my laptop was always ON). This is not a tutorial about Airflow or Docker but an explanation on how to set up a less demanding version of Docker environment to run Airflow locally. How to setup the lightweight Docker version to run Airflow.This adaptation was later incorporated into the DE Zoomcamp as seen in this video. I will present a technical variation I made to the initially proposed development to run Apache Airflow locally (see What means “to run one software locally”) with Docker and Docker Compose. ![]() This article was created under the scope of the first edition of the Data Engineer Zoomcamp by DataTalksClub. Forget about the “Low Memory” issues when running Airflow (logos are taken from Apache Airflow and Docker)
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |