Due to advances in distributed systems, social motivations, and economic motivations, scales of systems are on the rise. In large-scale systems, changes—caused by failures, maintenance, and additions—are a norm rather than an exception, and therefore, manually keeping these systems running is difficult, if not impossible. System management, which monitors and controls systems, is a prominent solution to this problem.
However, management usecases differ from system to system, yet developing a specific management framework for each system defeats the purpose of building system management frameworks in the first place. Management frameworks that enforce management logic authored by users provide a solution for this problem. These frameworks enable users to change framework’s decision logic to cater for user’s specific requirements, and after deployed, they monitor and control target systems in accordance to the user-defined management logic. If these logic assert only a single component of the system, we call them local logic, and if these logic assert multiple components in the system, we call them global logic. The global logic depend on a global view about a system, which is non-trivial to support in large-scale systems. However, they enable users to reason about the target system explicitly and, therefore, provide a natural way to express management usecases.
This dissertation presents a new, dynamic, and robust management architecture that manages large-scale systems by enforcing user-defined management logic that depend on a global view of the managed system. Using empirical analysis, we have shown that it scales to manage 100,000 resources, which demonstrates that the architecture can manage most practical systems. This is a testament that despite its dependency on a global view of the managed system, a system management framework can manage systems in accordance to user-defined management logic and can still scale to manage most real world systems. Furthermore, we have demonstrated that the architecture is robust in the face of failures and stable with respect to different operational conditions.