Welcome to the world of the Python!! Beautiful is better than ugly!!!!

Hey Folks!!!
 
Welcome back..... I am pursuing a further specialization in business analytics and this blog is regarding working with python on RadishSurvey Data Set. Just sit back and relax and run along the simple steps to find amazing analysis...... ;p Its like reading a story.
 We all love food. In today's analysis we have a survey on people who likes radishes (different variety). We would like to understand who loves which variety and few more insights.
Ok!!! So ready lets start.
 
1)      Data Loading
Welcome to the world of python. Today let us understand how do we work with strings. We have a data with us that includes 300 lines of survey data each line consists of a name and radish variety.
Source of data: “http://opentechschool.github.io/python-data-intro/files/radishsurvey.txt”
2)      Reading the data
Data can be read through the following code below:
Brief Understanding: We read the survey data and using the split function we have split the variety and name through a – and printed the results as ‘name’ voted for  ‘Radish Variety’. Please see below the snapshot for the same. Do not forget to change the path in your code while implementing
  3)      Inspecting  and Counting Votes

             Suppose we want to name and  the number of votes  for White icicle type

Code: For counting and listing the name of the people who like white icicle radish type. We have read the survey using the open statement. We have separated the data through split function. In this case compared to earlier scenario we are making use of multiple assignment where the statement name,vote = parts is assigning both names and vote to parts.

Meanwhile the strip() function has been used to separate the new line from the original  line.Suppose the line was “hello-mam\n” it would become “hello-mam”
 
In the below code we are using the If statement to compare the vote to white icicle. If it is same then increment the count function by 1 which had been initialized as 0 in the start of the programme.
We are getting the final count of people liking the white icicle through the print function
 
 
As seen there are 59 people who like white icicle
Now moving forward we know there are lot of varieties of radishes the major problem we observe is in order to get the count of the other varieties we need to repeat the above code and change the name of the variety. Such a tedious task!!! So what to do follow Step 4
4)      Making a generic function to count the number of people who like the different varieties of radishes
Here we have created a generic function where we have specified the argument for which we have to calculate the vote for. Like in the above case we have passed the argument as white icicle, Daikon and Silicy Giant and received the output as 59,58 and 52 respectively.
One can modify the argument according to the type he wants to see the count for.
 
What is the problem here!!! Scratch your head
 
Actually unless you know the name of the variety you cannot find the count. Memorizing the names are a very tedious and difficult task so we need to find an alternative where we create a dictionary that saves all the names .
5)      Counting the votes for all the categories
 
 
With the above code on can execute the counting of all the votes of different categories

We create a dictionary of radish names along with vote counts
Meanwhile moving ahead there are few problems with the code like Red King and red King has been taken as 2 different categories while it is actually one hence we need to clean the data we would use a function named str.capitalize() which would make the 1st letter of each string capital hence find the below mentioned code


 
Here we have used the function  vote = vote.strip().capitalize() where we apply the strip and capitalize function on vote and assign back to the vote variable
We have reached towards the end of the solution
We still have  an issue for double voting . Please find below the code to tackle the same
 
This gives us an understanding of people who has voted twice and gives a fresh count for each radish categories
6)      Find out the winner
All are interested in finding the winner. Winner stands alone!! One of the famous books of Paulo Coelho.Lets see the code below:
The loop shown above keeps track of one name, winner_name, and the number of votes cast for it. Each iteration through the loop it checks if there is a name with more votes than the current winner, and updates it if so.
Please check the second blog for finding out how we worked with charts on the same data set....
Happy Analytics :)
 
 

Comments

Popular posts from this blog

Kabaddi Match: Lets meet at the arena!! Aa jao Dam Dikhane!!!

Apache Pig : A tutorial to learn a small exercise on how to run a pig program Big Data Assignment Part 2 for Praxis Business School

Text Analytics Using R - Part A: Extraction of reviews of galaxy s4 product reviews in flipkart