Sunday, 15 March 2020

What is Data Visualization and Machine Learning?

       

            The good analysis project starts with explanatory visualization that helps you develop a hypothesis and it ends with carefully manicured figures that make the final results obvious. The actual crunching number is hidden in the middle and sometimes set as an aside. Machine learning is the signal in the data and the directions are most promising to further work and show more clarity of the number. The Matplotlib library for data scientists helps to integrates well with other libraries. Pandas is a  wrapper around the Metplotlib and does the image formatting. Data scientists were computer programmers or machine learning experts who were working on big data problems. They were analyzing the datasets of HTML pages, image files, emails, web servers to write the production software that implements the analytics ideas in real-time. He extracts the data from the raw future and works most of the time with getting data and form the statistical method to be applied. Unicorns will construct a good statistical model, hack the quality software and relate this in a meaningful way to business problems.
          AI is the broader term under which the machine learning and deep learning comes in. AI emphasizes the creation of intelligent machines that work and react like humans. The presentation part is the AI and the actual implementation part is machine learning or deep learning. Deep learning is the subset of machine learning which is concerned with algorithms that contain the structure and function of the brain called artificial neural networks. 
          Machine learning is the subset of AI which gives the ability to the machine and learn without explicitly programmed. The data is the key and learning algorithm. The machine trains itself according to the data provided to it. The machine learns from the data sets. When you have the datasets, start dividing the dataset into two parts like train and test. First, 80% of data is the training data and 20% of the data are test data to test our model. So, train the machine with 80% of your data and create the machine learning model. Once the model is created, we bring the test data to the test machine method. If the accuracy is not good enough, we repeat the process again and again to get the final screen and tested model. The more data you provide the more accurate the machine gets and easily able to identify the objects. This how the machine learns. We are surrounded by various Machine Learning Techniques. For ex, Machine Learning forces the product recommendation. When you shop on Amazon while browsing the product, you can notice the list of certain products that are similar to your interest. It is one of the applications of machine learning used in Amazon to build the recommendation engine which recommends a product for you. Whenever you call Alexa, it performs  particular functions like play music which plays favorite music or turns on the light etc., It is one of the machine learning applications to perform these actions. The machine learning application does the traffic prediction and predicts the traffic on that particular route from point A to point B.

Types of Machine Learning: There are 3 types of machine learning. Those are,
  * Supervised Learning
  * Unsupervised Learning
  * Reinforcement Learning
         In Supervised Learning, we have a label training dataset and machine train on that particular label. There is a teacher or supervisor who supervises the entire process and trains the machine according to that label. The use case of supervised learning is the spam classifier. The spam or not spam mails are classified on the filters. These settings are constantly updated based on new technologies, new spam identification and the feedback given by Gmail users for the potential spammers. It uses the text filter to eliminate the tag based on the sender and their history. The text filter uses the algorithms to detect the phrase or words most often using the spamming. Another filtering method is the client filter which blocks malicious or annoying spam emails. It looks all the messages of the certain user on the send out. If the send out has a huge amount of emails constantly or several of the messages marked as spam to the text filter and that case the email will be blocked. This brings the user block list that prevents any inbound messages from the email address going forward.
      In Unsupervised Learning, there is no teacher and no label. The machines identify the objects through clusters in it. It finds similar images and groups them into a cluster. Cluster one consists of a similar item and cluster two consists of other similar items that are not at all related to cluster one. The use case of unsupervised learning is a voice-based personal assistant of Amazon Alexa. It recognizes the word Alexa and sends the recording to the internet to Amazon. This process is called Alexa voice service or AVS.  For ex, If you ask for a time, the AVS sends back an audio file telling you the time which the echo playback. It runs by Amazon and converts the recording into a command that it integrates. It is a simple voice to text service. Amazon also offers sample code for building echo using Raspberry pi. You can set up Philips/ Solimo smart light to be controlled with Alexa to turn on the living room light. Amazon adding more features and skills to Alexa, if you are smart enough to build on your own to control things that are not in the list.
     In Reinforcement Learning, we have an agent and environments. The agent selects some action using some policies. There is no teacher in training the machine. If the machine makes the right decision and gets a positive point. If it makes the wrong decision and gets a negative point for it. Again and again, the machine learns from it. This is the process of reinforcement learning. For ex, In the self-driving car, initially, the car was trained and didn't know what way to choose. So, it makes the action but the action perform was wrong. So, it will get a penalty of 50 points. Next time, the car realizes the past action and updates the policy and iterate the process. This is reinforcement learning. Autonomous or self-driving cars are safer than human-driven cars. It is unaffected by human fatigue, emotions or illness. It is always active and attentive to observing the environments and scanning multiple directions. It is difficult to make a move that the car has not anticipated. It mainly depends on 3 technologies like IOT sensors,  IOT connectivity, and software to guide them. There are many types of sensors in a self-driving car like a sensor for blind-spot monitoring, forward collision warning, radar and ultrasonic. All of this IOT sensor make the best navigation of a self-driving car. Next, the IOT connectivity uses cloud computing to act upon traffic data, maps, weathers, surface conditioning among others. This helps them to monitor their surroundings better and make informed decisions. The car collects all the data and determines the best course of action. In today's world, the Tesla cars analyzing the environment using software known as autopilot. It uses high-tech cameras to view and collect data from the phone which is same as what we do with our eyes. It is called computer vision or sophisticated machine cognition.

Basics of Data Visualization: Everyone uses statistics like average sales per customer, the average height of a class etc., It is a useful summary of the dataset and hides the detail within that data. For ex., take an Anscombe quartet, it is a group of 4 datasets. Each consists of 11 pairs of values x and y. If you want to visualize the data, you need to plot x-one and y-one from the dataset. According to summary statistics, we have 4 datasets and when you visualize, they are completely different. The tables or numbers hide the information within that data. The data visualization helps to explore, understand, and explain our data. These questions can be resolved in data visualization like,
   * How is this data related to that data?
   * How is this data distributed?
   * How is this data made up and how does this data look on map?
      Especially, it depends on What type of answer do I want to find with my data? Comparisons are where you want to see one bit of data against another. For ex, how do sales compare across regions?
The charts are great for comparisons. Bar charts are good for category data. Time series charts will be used to line data when you have date component. Relationships in data involve one or more measurements and examine how the dimensions affect that relationship. For ex., comparison of height, weight and see how that relationship changes across countries etc., If you are looking for patterns and outliners of data, you can do this in scatter chart where the position of data points relates to measuring values for the dimensions in the view. Summary statistics look at the distribution of data and need to see the shape of the data has. Is it clustered around median, bi-model? does it skewer towards higher or lower value? The histogram is a great way to look at the summary of distribution. There are alternative ways of seeing the makeup of data are pie charts, area charts, and treemaps. The good starting point of visualization is to ask yourself what kind of questions do I want to ask? then, make the right choice every time depending on that data. 

No comments:

Post a Comment