Misconfigured Apache Airflow Platforms Threaten Organizations



Many organizations utilizing the favored open supply Apache Airflow platform to schedule and handle workflows could also be exposing credentials and different delicate knowledge to the Web due to how they use the expertise, researchers have discovered.
Safety vendor Intezer this week stated it just lately found a number of misconfigured Airflow cases exposing delicate data belonging to organizations throughout a number of industries, together with manufacturing, media, monetary providers, data expertise, biotech, and well being.
The uncovered knowledge included consumer credentials for cloud internet hosting providers, cost processors, and social media platforms, together with Slack, AWS, and PayPal. Intezer discovered that at the least a few of the knowledge uncovered by way of misconfigured Airflow cases might enable menace actors to realize entry to enterprise networks or execute malicious code and malware in manufacturing environments and on Apache Airflow itself.
“It’s fairly simple to search out uncovered cases,” says Ryan Robinson, safety researcher at Intezer. To find one, all a menace actor should do is scan IP addresses and test them for the anticipated HTML file. “It’s trivial to search out delicate data on uncovered cases, however to use it to run code is way more durable and requires a stable understanding of every platform,” Robinson provides.
Organizations use Apache Airflow to create and schedule automated workflows, together with these associated to exterior providers, comparable to AWS, Google Cloud Platform, Microsoft Azure, Hadoop, Spark, and different Apache software program. A survey of its utilization in 2020 confirmed most of its customers are knowledge engineers, scientists, or knowledge analysts at midsize to giant firms. Greater than three-quarters of organizations do little to no customization of the expertise earlier than utilizing it. 
Airflow permits customers to orchestrate jobs that contain a number of duties, Robinson says. For instance, he says, a job would possibly contain producing experiences, then emailing them to shoppers; one other job would possibly contain accumulating, processing, and importing knowledge to AWS buckets. 
Whereas Airflow offers customers a number of choices to make use of it securely, organizations can put knowledge in danger via the best way they use the platform.
Intezer, for example, discovered insecure coding practices to be the most typical trigger for credential leaks in Airflow. Intezer’s analysis uncovered a number of Airflow cases through which passwords had been hardcoded both into the Python code for orchestrating duties or in a characteristic that permits a consumer to outline a variable worth. In different cases, Intezer discovered customers misusing an Airflow characteristic known as Connections and storing passwords in plaintext as an alternative of encrypting them.
“Airflow offers good choices to retailer delicate data securely via their Connections characteristic,” Robinson says. The characteristic permits organizations to make sure passwords which might be used to push and pull knowledge from different programs are saved in encrypted trend. “For instance, a process will obtain knowledge from one platform utilizing an API key, then course of this knowledge in one other process and retailer this knowledge in a database utilizing a password to attach. One workflow could have to work together with a number of distant programs,” Robinson says. Customers usually misuse the Connections characteristic or instantly hardcode the credentials into the Python scripts, bypassing the characteristic altogether, he notes.
Insecure PracticesIntezer discovered different methods through which customers can put enterprise knowledge in danger via insecure use of Airflows. One instance entails the settings associated to an Airflow configuration file that always comprises delicate data, comparable to passwords and keys. If the setting shouldn’t be safe, anybody can entry the configuration file from the Net server consumer interface, Intezer stated in its report. Equally, a characteristic in older variations of Airflow that permits customers to run advert hoc database queries is harmful as a result of it requires no authentication and permits anybody with server entry to get data from the database.
Intezer recommends all organizations utilizing Apache Airflow replace to the newest 2.0.0 model of the platform and to ensure that solely approved customers are allowed to connect with it.
“Model 2.0.0 has made nice enhancements in safety,” Robinson says. The brand new model has a completely supported API, in contrast to the experimental API in earlier variations. Different main enhancements embody implementing authentication and eradicating delicate data from logs, in addition to modifications to the construction of the principle configuration file, he says. Some older — and harmful — options comparable to Advert-Hoc Question have been deprecated within the new model of Airflow.
Robinson says it is exhausting to know for certain if attackers are focusing on insecurely configured Airflow platforms; nevertheless, he says it might be an inexpensive assumption that Airflow cases have been focused.