Driving Innovation and Cost Efficiency for a Global Healthcare Giant

Our customer is a healthcare company that operates in more than 100 countries, focusing on developing and producing medicine products.

Client

A Global Pharmaceutical Company

Year

2023

Stack

Databricks, GitHub, Terraform, CDKTF, Apache Spark, Azure Data Lake Storage, Delta Lake, Python

⸻ Business Impact

Significant Reduction in Production Cost

This project empowers our customer to excel in a competitive industry by gaining full transparency on their production process, a scalable integrated data platform, and actionable insights powered by AI and machine learning services from the Cloud.

The solution, delivered within a few months, led to a significant boost in total yields and a cost reduction for our customer’s production operation.

⸻ STARTING POINT AND GOALS

A compressed timeline, and layers of legacy systems

In the realm of data, time is money in its purest form. Every moment we don’t harness data-driven optimization, we’re forfeiting millions in potential growth and savings.

The main challenges are:

  1. Data is generated in different formats
  2. Historical and current data are stored on different infrastructure
  3. The manufacturing process uses different production systems

Our customer’s production line generates massive amounts of data by itself. Additionally, pharmaceutical production comes with stringent standards-driven processes, which led to the manual creation of additional data sets on top of all the sensor-driven data. These circumstances result in large quantities of current and historical data from many sources in unique formats that are not ready for direct analysis but rather need to be unified and prepared to unlock insights.

Additionally, the reliance on on-premise data storage due to strong data protection requirements limited the company’s ability to use cloud-based advanced analytics and AI-driven services effectively.

This constraint impeded the company from leveraging its production data to its full potential for increased production efficiency.

MobiLab pharma case study data integration

⸻ Solution

Streamline Data and Leverage Full Cloud Services

MobiLab‘s Data Platform Delivery Team designed a new data platform that can be scaled across different business units, enabling reliable Data Integration in real time.

To ensure timely project delivery, we prioritize understanding the customer’s business, fostering strong communication, and employing a pragmatic approach to swiftly develop the digital twin of the production process. We implemented the medallion architecture to streamline and organize data in the Cloud, allowing efficient data ingestion, enhanced scalability, and improved data quality.

The data is extracted from sensors of multiple manufacturing devices and then combined and connected with the rest of the production data to deliver an aggregated view for further analysis by our customer’s data science team. We used Databricks Autoloader to process frequently appearing data to improve usability and scalability. The data is then managed within the Data Lakehouse, using the Unity Catalog to access data securely and compliantly.

⸻ Tech Highlights

➞  Streamlined Data Ingestion

Our implementation of reusable data extraction pipelines seamlessly combines real-time and historical data from multiple sources, enabling micro-batching ingestion for non-frequent sources. This creates a cohesive ecosystem capable of delivering real-time insights and enhancing operational efficiency.

➞  Easy Analytics

Moving data to the Cloud for analytics releases the potential for valuable insights and scalability, liberating organizations from tool constraints and enabling them to exploit the full power of their data. This framework is well-suited for the implementation of AI models.

➞  Protection and Governance of the Data

Data is governed via Unity Catalog, a unified governance solution by Databricks, which brings access control to a simple and robust structure for data and AI models in a Lakehouse environment.

➞  Source-driven Architecture

The codebase, which also includes infrastructure, is hosted on GitHub for efficient development. Our CI/CD methodology for automated deployments gives the advantage of a repeatable infrastructure and ensures consistent and reliable deployment processes.