What are the fundamentals of Cloud Computing and Data Science?

         

         Data science refers to a collection of related disciplines focusing on the use of data to create new information and technology. It provides useful insights for better decisions. For ex, big data overcomes the challenge of analyzing the huge volume and modern data generated at high speed. In the real world, computing devices such as cellphones, security cameras are constantly generating data and connected to the internet, also known as IOT is ever-growing. Computers can make decisions based on trusted algorithms to make accurate predictions. Data analytics is a more enhanced way of taking advantage of exponentially increasing computing power and storage capacity. You need the basic knowledge of statistics to be a successful data scientist. Basically, the data industry is driven by IT in the languages like python, R which comes with powerful libraries that implement statistical functions and visualization features. The programmer or data scientist automate the necessary tasks and focus on solving large problems. Distributed file system like Hadoop and distributed processing like Spark plays a critical role in big data that enables you to make informed decisions. Machine Learning helps to detect data patterns and make better predictions about a dataset. For ex, In fraud detection, machine learning dramatically reduces the workload by a significant number of data points and presents only the suspicious candidates. The visualization tools can greatly enhance the presentation. Data scientists need to specialize in core job duties in particular area.
          Data science requires support from cloud computing and virtualization for the ever-increasing size, speed and accuracy requirements for the data sets we have to manage. Cloud computing provides the scalability requirement for computing resources. Actually, the cloud provides the processing power and storage space. The software application connects virtual machines through a high-speed network and implements distributed file and processing systems. Hadoop and Spark are the key elements that build on virtual machines. It solves data science problems by connecting the specific data science application. Cloud computing, virtualization, machine learning, and distributed computing are technologies for data scientists to do their job effectively. Proxmos is easy to install for cloud computing and virtualization to build your own cloud and configure the software. Weka is a machine learning tool that allows users to run various machine learning algorithms in a GUI environment.

Fundamentals of Cloud Computing: If you want to familiarize yourself with Azure computing, first you need to familiarize yourself with cloud computing as a whole. There are 3 types of cloud computing. Those are,
  1. Public Cloud
  2. Private Cloud
  3. Hybrid Cloud
       When we talk about the infrastructure, you need to know the infrastructure deployed in your company and you need to manage the server, hardware, services, firewalls managed in your organization by internal administrator who is responsible for the functions and functionalities for the user. The user consumes the services and you need to update or upgrade and manage the hardware that services live on. In a private cloud, the user could be an administrator who has a portal based environment from which they manage the environment, provision servers, deployed applications, websites, and all the things. It depends on the software that manages the private cloud and that exposes all the functionalities in the portal. For ex, the System Center 2012 R2 by Microsoft provides the private cloud infrastructure. It is typically a private data center so that you will be responsible for hardware, software, and network services. The vendor is responsible for most tasks that are performed in the public cloud like Microsoft Azure, Google public cloud, AWS. It uses a leasing base model which is basically pay as you go or use infrastructure that you consume resources of workloads, applications and services. The usage can be the data stored in the cloud infrastructure and services offered by virtual machines. The advantage of public cloud infrastructure is that you can deploy a new application or server at a very low cost and you don't need new hardware to support the additional infrastructure. Ultimately, it reduces the capital expenditure of the company. The Hybrid cloud is the mix of public and private solutions where you can have your own internal private data center, store workloads with some services, and applications into the public cloud. It is more complex to manage because you have to manage both environments with coexistence.

Cloud Computing Services: It is a collection of remote servers connected via computer networks available through internet. Virtualization implements cloud computing. It uses the Hypervisor operating system on which many OS can be installed like Windows and Linux. You can fire up virtual machines and leverage vast resources of cloud provider when the business operation grows gradually and exponentially. It is the flexibility to grow your infrastructure quickly if necessary.  Cloud computing companies specialize in managing server farms and know-how to maximize the profit and minimize the expenses. There are 3 major deployment models of cloud. Those are,
   1. Infrastructure as a Service(IaaS) - The computer lab is an infrastructure that you are trying to use as a service. For ex., If you want 10 PCs, you can use AWS EC2 and start 10PCs and put them in the same network. Now, this is our computer lab.
   2. Platform as a Service(PaaS) - It is the platform to run your code. You can just go to the cloud and tell them which compiler and interpreter you want and can run it. The cloud IDE can used to write and run the code.
   3. Software as a Service(SaaS) - It is self-explanatory. For ex, the google docs and dropbox which gives free storage and this is a cloud service software for your purposes.

Application Migration to Cloud:  A successful migration of the large portfolio requires a couple of things. Those are,

    * Think and Plan strategically and,
    * Rapidly iterate through feedback loops to fix the things that are going wrong

      But, there are a lot of things to consider when migrating to cloud. It includes application architecture, the ability to scale-out, distributed nature etc., When we are migrating the applications to the cloud, we are getting a new architecture that's going to have different properties or different characteristic than traditional systems. The advantage of cloud migration is the ability to do a active-active architecture. It means we run the application in real-time at the same time. One application takes over the other if there's a failure. Here, we are automating the things and the goodness of being in the cloud is worth to the business. The application migration are necessary, because
   * you are selling to the stakeholder that are funding to the cloud migration.
   * It makes the business more agile and delivers the value.
   * We are understanding the applications in wide for the specific needs of the application and looking at the general consensus.
   * We are modernizing the things in moving the database models, technologies, improving the security and governance and leveraging the systems whatever the purpose we need.

The important steps in Cloud Migrations are,                                                                         
                                           
Ultimately, there is a bit of trial and error, so set the operation processes and continuous improvement.

Data Migration to Cloud: Data is the highest priority when migrating to the cloud. Basically, data is the business and it is everywhere in enterprise. The data is killer application of cloud computing. We are migrating to the cloud and finding more values in new ways in innovations through running databases, big data systems, predictive analysis, and AI based systems in the cloud. So, the data selection is a critical process to understand which database is bound to which applications, what they're doing, security issues, compliance and performance issues that leads to success. In the Business case, migration, testing, and deployment are the understanding of data. You need to look at the applications that depend on data and do the deployment. Ultimately, the goal of leveraging data is to lowering operational costs, integrating existing data silos to make different databases communicate one another as a single dataset and influence actions and outcomes but not just data so they have the information they need to run the business better. We are not going to move every piece of data that exists on-premises into the cloud. We may move 70% of it and we have to deal with integration with on-premise data stores and those that exist in the cloud. So, make sure to build a solid architectural foundation for success when considering data, avoid duplicate data and data silos. In a real-world scenario, you need to consider the following things when you migrate to cloud,
 * It is necessary to understand the total cost of ownership(TCO) for the first year, second year, five years etc., It includes the TCO for applications, for databases, for cloud instances and ROI. The top 5 TCO/ROI are,
   - Value of Agility
   - Cost to retire selected applications, infrastructure or data centers
   - Changes required to maintain a service level
   - Software costs
   - Organizational transformation costs
 * Ensure that the solid business case exists before the migration can begin and how the technology going to be applied
 * The value metrics or value points that need to be determined like including the agility, compressed time to market, cost savings etc., by which you will be measured against the total cost






Comments