Umuc data 620 assignment 1 write a few sentences - what is


Assignment 1-

1. Basic histogram. 

a. Import the data in Table 1  Data Set 10.1.1 - Basic Histogram Datato Tableau and make a basic histogram from it.  To make a histogram, you can follow the directions here.  For this data, use a bin size of 2.  (Your graph may vary a bit from the example below).(Online help at https://www.tableau.com/learn/tutorials/on-demand/histograms)

b. Change the y-axis on your histogram to reflect percentages (you can do this in the Quick Table Calculation pull-down from your ROWS variable.)

c. Create a different histogram with the exact same data, with a bin size of 8. 

d. Create a dashboard with the two histograms on it, and submit a screenshot of this dashboard in your TURN IN TEMPLATE.

e. Write a few sentences - what is the difference between the two histograms? Which one would you use under which circumstances?

2. Histogram with a parameter slider for the bin size.

a. Follow the directions in the histogram video and implement an interactive parameter for the user for the bin size.  Set it so it's a slider.  To do this, you may have to right click on the "choose a bin size" parameter and make sure it's set to slider. 

b. Take two or three screenshots of your sliding bin, with two different bin sizes, and submit them in the TURN IN TEMPLATE.

c. Answer this question: how can the sliding bin size parameter help you as a data analyst?

3. Histograms across Warehouses.  Import the data given in Table 2  Data Set 10.1.2 - Warehouse Histogram Data.

a. Create a histogram of this data with a bin size of 3.  Take a screenshot.

b. Use Tableau to incorporate the additional warehouse information.  We are looking for something like the stacked histogram shown below.  (Your data may vary slightly.)  Here's what I got:

c. Look at how these histograms differ from the overall histogram you created in step a. 

d. Submit a screenshot of the overall histogram, a screenshot of your split by warehouse in the TURN IN TEMPLATE.  (Just put the two screenshots next to each other in the same box.)

e. Answer the following questions: 

f. Question 1:  If you only had the overall histogram, how would you narrate the order delivery time? 

g. Question 2:  If you then saw the by-warehouse split, how would you narrate the order delivery time?  Use the phrases "skewed left" and "skewed right" wherever applicable, and make sure you get them correct (look them up if you need to - the Internet is a great place to start.)

h. Question 3:  You have 30 seconds of the CEO's attention.  What single business action would you recommend to her based on your histograms here?

4. Boxplots. We are going to make some boxplots in Tableau!Many of you have seen boxplots before; this week, we emphasize the statistical knowledge that can be pulled from a boxplot.

a. Import the data shown in Table 3  Data Set 10.1.3- Sales Data by Time Zone for Boxplots above.  Make a boxplot of this data (online help:  https://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#buildexamples_boxplot.html )

b. To get it to work, you want to make sure your measures aren't aggregated.   To get mine to work, I started with the Time Zone in the Columns, the Sales Volume in the Rows, and then had it make me a bar chart and then I switched the mark type to circles.  (To get a bunch of little circles, make sure the Analysis -> Aggregate Measures box is not checked.)

c. Hover over one of your boxplots and it will show you the actual data.  Here, I'm hovering over the Pacific data. In the TURN IN TEMPLATE, submit a screen print of yourself hovering over one of your boxplots.

d. The boxplots show you at a glance not only the median value (line in the middle) but also the spread and any outliers.  An outlier is something above the top "whisker" or below the bottom "whisker;" on the example chart above, there's an outlier in the Central sales (way at the bottom, where the sales are very close to 0).   Write a sentence or two describing what you see here.  In particular, do you see any differences in median, spread, or quartiles between the regions?  If you wanted to boost sales in one region, which would you pick and why?

5. Heatmaps.  A heat map conveys numeric information, using colors (or "heat") to show one of the dimensions.  We're going to make one covering those three very important pieces of data about the success metrics for all of us:  IQ, shoe size, and salary.  You can find directions from Tableau here:  https://onlinehelp.tableausoftware.com/v8.0/pro/online/en-us/buildexamples_heatmap.html

Here's how I did it:

a. Import the data in Table 4 Data Set 10.1.5  Success Metrics for Everybody into Tableau.

b. If necessary, move IQ and Shoe Size from Measures to Dimensions

c. Bin IQ into something reasonable (here I chose bin sizes of 10)

d. Make a graph with IQ on one axis and Shoe Size on the other, and use Squares as the marks:

e. Converted Annual Salary to a Continuous Dimension

f. Used Annual Salary as the color and changed its Measure to Average:

g. Make your own heatmap from the data here.  Use what you know about colors and graphs to ensure that the color scheme is highlighting what you want to highlight.  Consider the divergent color schemes to bring out interesting data.  Submit a copy of your heat map in the TURN IN TEMPLATE.

h. Write a few sentences - what can you conclude from your heat map?  What, if any, relationships did you find between IQ, shoe size, and salary in this data set?

We're now going to practice on some larger data sets.  Use the following data sets from Tableau to answer the following questions.  Answer the questions using Tableau visualizations.  Please try to answer each question with one and only one Tableau graph (but if you absolutely need more than one, go ahead and be sure to justify why you need it.)

Do not use Excel or other methods to determine your answers.

You can download the data sets from here:  https://public.tableau.com/s/resources?qt-overview_resources=1

6. Millennial vs Baby Boomer Employment Data set.  Use the "National, 5-digit" sheet.    For Baby Boomers, for 2013, what were the three biggest job titles ("Occupation" field) and how many total Baby Boomers worked in those three fields?  Please submit the Tableau screenshot(s) you used to determine this, and give a little narration.  Paste your answers in the TURN IN TEMPLATE.

7. Millennial vs Baby Boomer Employment Data set.  Use the "States" sheet.   In which state was the total Job Change the most negative (i.e. in which state were the most jobs lost between 2007 and 2013?)  How many of those were Boomer jobs vs. Millennial jobs?  Paste your answers in the TURN IN TEMPLATE.

8. Global Sport Finances data sheet, Top Athlete Salaries data.  After you connect to the data, you will need to scrub it up a bit before Tableau can work meaningfully with it.  In particular, I had to do this:

a. After I loaded the "Top Athlete Salaries" sheet, I had to split the salary data.  It comes with a "M" appended at the end of the salary, but that means Tableau will view the entire field as a text field.  We want to do number analytics on it, so I want to tell it the salaries need to be numeric.  First step:  remove the M.

b. Then, I had to convert the split field to a Measure:

c. Next, I had to continue to convince it to treat salary as a number:

d. Scrub your input data as per above.  Make a graph to answer this question:  which sport has the highest *average* pay for 2014?  What was the average pay?  Paste your graph and your answers in the TURN IN TEMPLATE.

9. Global Sport Finances data sheet, Top Athlete Salaries data.  You notice that Basketball, Cricket, and Soccer all have very similar average earnings for their players in this list, all about $30 M for 2014.  If you were told you would be given the annual salary for one athlete chosen at random from these three categories from this data set, would you choose to be given the salary of a randomly chosen basketball player, a randomly chosen cricket player, or a randomly chosen soccer player?  Why?  Make one graph to answer this question, and paste it in the TURN IN TEMPLATE.

Assignment 2 -

Take the following Python code that stores a string:

str = 'X-DSPAM-Confidence: 0.8475

Use find and string slicing to extract the portion of the string after the colon character and then use the float function to convert the extracted string into a ?oating point number

The objective here is to correctly isolate the numeric portion of the given string before applying the float() function to turn it into a floating point number. We can see here that the numeric part to be extracted appears at the end of the string. So what we need to do is to extricate the part of the string from one character position after the colon, up to the index position that represents the end of the string. The position of the colon within the string can be found using the "find" function as shown. The index position that represents the end of the string can be found using the "len" function on the whole string. Remember that index values begin at 0! Once we have stored the isolated string with the numeric part into the variable strNum, we just remove any blank space around it using "strip". If the extraction has been done correctly, this step is redundant, but it is good to ensure this before we turn the extracted string into a Python floating point number. The output from the program is shown below:

 TURN IN #1:  Severance Chapter 7 - Exercise 2 - ASSIGNMENT

Write a program to prompt for a file name, and then read through the file and look for lines of the form:

X-DSPAM-Confidence: 0.8475

When you encounter a line that starts with "X-DSPAM-Confidence:" pull apart the line to extract the floating-point number on the line. Count these lines and then compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence.

Enter the file name: mbox.txt

Average spam confidence: 0.894128046745

Enter the file name: mbox-short.txt

Average spam confidence: 0.750718518519

Test your ?le on the mbox.txt and mbox-short.txt ?les

HINT:

1. Download the two text files: mbox.txt and mbox-short.txt from https://www.pythonlearn.com/code3/to your local machine. For ease, ensure these files reside in the same folder as the .py file for this assignment.

2. Begin writing your code by prompting the user for the file name. Use a try-except block to exit with a user-friendly error message if there is an error opening the file name specified.

3. Once the file is opened, use an iterative loop (e.g. "for" or "while") to traverse each line of the file.

4. (A quick manual exploration of mbox text files reveals that the number representing spam confidence is found at the end of the line).In each line, find the pattern "X-DSPAM-Confidence:". If this is found, extract the portion of the line after this pattern until the end of the line. "find", string extraction and "strip" functions are useful here.

5. Convert the numeric part extracted from the line into a float.

6. When the program has finished traversing each line in the specified file, total the number of lines that had the pattern and compute average spam confidence.

7. Note: In your calculation for average spam confidence, do NOT count the lines that did that not contain the pattern. Be sure to comment your program adequately!

8. Good programmers test all cases, so you should make sure you test for erroneous inputs and also test on the mbox-short.txt and the mbox.txt files.

When you are ready to check your DSPAM code, open the Assignment 10.2 DSPAM Grading.  It will give you a new mbox file to download. Run your DPSAM code on the new file, and enter the average spam confidence.

Attachment:- Assignment Files.rar

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Umuc data 620 assignment 1 write a few sentences - what is
Reference No:- TGS02247018

Expected delivery within 24 Hours