To ensure consistent, high product quality and reduce development time, the biopharmaceutical industry constantly endeavors to optimize manufacturing processes. Research in this area has received significant recent attention as a result of the U.S. Food and Drug administration Process Analytical (PAT) initiative. This initiative has led an industry-wide effort to utilize "multivariate data acquisition and analysis tools, modern process analyzers or process analytical chemistry tools, process and endpoint monitoring and control tools, and continuous improvement and knowledge management tools." It is hoped that through these advances, process developers will achieve improved process understanding and continuous process improvement.
In many industrial situations, a large number of process variables are measured and recorded frequently for the entire duration of a batch, which is a large amount of data to process. To mitigate this burden, data-driven modeling techniques, such as principal component analysis (PCA) and partial least squares (PLS), provide the means for the dimensional reduction of these complicated multivariate systems. Effectively, PCA and PLS combine the major sources of variation in a system into a small number of orthogonal directions, thus significantly decreasing the complexity of the system. This dissertation addresses theoretical and practical issues related to batch process monitoring using PCA and PLS. The primary contributions of this research focus on online fault detection and diagnosis, online product quality prediction, and pattern matching in large datasets.
In this dissertation, PCA and PLS techniques are used to detect and diagnose a variety of process faults. Once data-driven models are constructed from historical data, these models are used to monitor batches online conducted at an industrial bioprocess pilot plant. To monitor new batches, statistical quality control confidence limits are calculated and when limit violations occurred, a diagnostic method is utilized to determine the specific variables most impacted by process faults.
PLS is used to make online predictions of quality variables from process data. This method accurately predicts final and intermediate titer measurements for normal operating condition batches and successfully detected abnormal batches. This approach allows for the continuous monitoring and prediction of titer values throughout the course of the batch. Different online prediction methods are developed and compared with existing methods.
Two pattern matching metrics, the PCA and PLS similarity factors, are used to make direct comparisons between batches. This pattern matching method has the benefit that it is both data-driven and unsupervised because unlike other pattern matching approaches, neither training data nor a process model is required. This approach generates a data-driven model (i.e., either PCA or PLS) for each separate batch and then calculates the degree of similarity between two batches. A related diagnosis method is able to identify the key process variables that cause dissimilarities between the two batches. This approach is capable of quickly screening large amounts of data for process differences.