Context: DW design and Implementation for Scalibility
Scalibilty is one of the prime components and most important design criteria for any DW/BI system design and Implementation; you can expand or reduce the capacity of the system by adjusting the number of processes available to the cluster. Increasing or decreasing the capacity of a system by making effective use of resources is known as scalability. A scalable system can handle increasing numbers of requests without adversely affecting response time and throughput.
Overview:
Scaling is the process of increasing or decreasing the capacity of the system by changing the number of processes available to service requests from sources. Scaling out a system provides additional capacity, while scaling in a system reduces capacity. Vertical scaling involves adding more Tiered Component to the same computer, to make increased use of the hardware resources on that computer. Horizontal scaling involves adding more computers to the environment. The three tiers where scalibilty can be addressed are as follows, the technology components used in the below article are Oracle database, Informatica on the ETL side and OBIEE on the Reporting tier.
2.1.3.1) Scalibility at the database Tier:
When designing for Scalibility, Oracle RAC is a smart option which allows you to start small and grow incrementally as your business demands grow. RAC is based on Oracle's CacheFusion architecture which gives the ability to share your database on a common set of disks and memory across multiple servers, and delivers near-linear scalability as each additional server is added to the cluster.
Oracle Real Application Clusters runs real applications – and hence it’s name. It runs ALL Oracle database applications: transaction processing, data warehousing, 3rd party, and homegrown applications are all supported with NO code modification.
And Oracle has dramatically simplified the management of clusters. Therefore, if your business needs to support more data you add more disks. Similarly if your business needs to support more users or a greater transaction throughput you add more servers to the cluster. What I am saying it that you can “scale out” -- effectively providing ‘capacity on demand’ for your business - without having to take your users offline! And the entire time, Oracle’s Clustered Database looks and works exactly like a database on a single machine
2.1.3.2) Scalibility at the ETL Tier (Using Informatica)
In the informatica world a grid is a name assigned to a group of nodes that can run sessions and workflows. Running a workflow in a grid means distributing Session and Command tasks to service processes running on the nodes in the grid, workload is divided by distributing session threads to multiple DTM processes running on nodes in the grid. To run a workflow or session on a grid, you assign resources to nodes, create and configure the grid, and configure the Integration Service to run on a grid.
Some of the basic resources for PowerCenter are the database connections, files, directories, node names, and operating system types required by a task. You can configure the Integration Service to check resources. When you do this, the Load Balancer matches the resources available to nodes in the grid with the resources required by the workflow. It dispatches tasks in the workflow to nodes where the required resources are available. If the Integration Service is not configured to run on a grid, the Load Balancer ignores resource requirements.
Load Balancing in Informatica can be supported by three types of algorithms popularly called dispatch modes in Informatica, The informatica Load balancer uses dispatch mode to pick up a node for load balancing and all Integration Services in a domain use the same dispatch mode.
Round-robin: Each Node in Informatica can be marked for the maximum process threshold, when a request arrives the Load Balancer dispatches tasks to available nodes in a round-robin fashion, in the process it checks the Maximum Processes threshold on each available node and excludes a node if dispatching a task causes the threshold to be exceeded.
Metric-based: The Load Balancer evaluates nodes in a round-robin fashion. It checks all resource provision thresholds on each available node and excludes a node if dispatching a task causes the thresholds to be exceeded. The Load Balancer continues to evaluate nodes until it finds a node that can accept the task.
Adaptive: The most advanced approach for load balancing, in this approach the Load Balancer proactivly ranks nodes according to current CPU availability. It checks all resource provision thresholds on each available node and excludes a node if dispatching a task causes the thresholds to be exceeded.
2.1.3.3) Scalibility at the Reporting Tier (Using OBIEE)
Scaling is the process of increasing or decreasing the capacity of the system by changing the number of processes available to service requests from Oracle Business Intelligence clients. Scaling out a system provides additional capacity, while scaling in a system reduces capacity. Scaling the Oracle Business Intelligence environment applies principally to resource-intensive system processes and Java components. When you deploy more processes, Oracle Business Intelligence can handle more requests while staying responsive to requests.
Vertical scaling involves adding more Oracle Business Intelligence components to the same computer, to make increased use of the hardware resources on that computer. Horizontal scaling involves adding more computers to the environment For example: Oracle Business Intelligence can be vertically scaled by increasing the number of system components servicing requests on a given computer and results in increased use of the hardware resources on a given computer. For example, Oracle Business Intelligence is horizontally scaled by distributing the processing of requests across multiple computers.
The three system components that support both horizontal and vertical scale-out are Oracle BI Presentation Services, the Oracle BI Server, and the JavaHost. Oracle BI Scheduler uses Presentation Services and Oracle BI Server processes to perform computationally intense work on its behalf, while the Cluster Controller only manages other components and does not itself do any computationally intense work. Because of this, there is no need to scale out either Oracle BI Scheduler or the Cluster Controller. You can distribute these two processes as needed for high availability deployments, but they do not need to be scaled for capacity.