library(tidyverse)
Project Proposal
Team 2 Proposal
Data 1
Introduction and data
The dataset is from the World Health Organization and has data on factors that relate to life expectancy, including mortality, immunization, and socioeconomic factors. As stated by the World Health Organization, data is collected through household surveys, civil registrations, and institutional sources such as health facilities. This data is then curated into the Global Health Observatory database. The dataset has data from all countries collected from 2000 to 2015. Each observation is a country’s health status for a given year, and it includes variables such as life expectancy, mortality rates, country development status, schooling, disease rates, and alcohol rates.
Research question
Question: Which health and socioeconomic factors most greatly affect the life expectancy of a country, and does this differ between developed and developing countries?
The research topic is life expectancy and the factors that influence it, as well as how these factors differ between countries. Our hypothesis is that developed countries will on average have higher life expectancy than developing countries, and that the factors most correlated with life expectancy are immunization coverage and income composition of resources. There are both categorical and quantitative variables involved in our research topic. Development status is a categorical variable and immunization coverage, income composition of resources, disease rates, mortality rates, and schooling are quantitative variables. Life expectancy is a quantative variable.
Literature
Article: https://www.who.int/news/item/19-05-2016-life-expectancy-increased-by-5-years-since-2000-but-health-inequalities-persist
This article from the World Health Organization provides a broad overview about the disparities in life expectancy between different countries. Although life expectancy fluctuated more in the 20th century due to political circumstances, it had largely stabilized (for most countries) by the turn of the 21st century. Globally, the life expectancy for children born in 2015 was 71.4 years. However, for high-income countries, life expectancy is closer to 80 years or more, and for low-income countries, life expectancy is under 60 years. Despite dramatic gains in life expectancy in the last two decades, these major inequalities still persist. This report focuses on access to health services as a major determining factor for a nation’s life expectancy. Complications with pregnancy and childbirth, HIV incidence, malaria outbreak, cardiovascular disease, cancer, lack of clean water, etc. represent just some of these risk factors that stem from or are exacerbated by underlying issues with healthcare infrastructure.
Research Question: Our research question builds on the WHO article by determining which risk factors have the greatest, most direct impact on life expectancy for different countries. Additionally, the data set includes variables that are not directly mentioned in the article, so bringing more factors to the discussion regarding disparities in life expectancy provides us with a more nuanced outlook on the topic.
Glimpse of data
<- read.csv("data/Life Expectancy Data.csv")
life_expectancy glimpse(life_expectancy)
Rows: 2,938
Columns: 22
$ Country <chr> "Afghanistan", "Afghanistan", "Afghani…
$ Year <int> 2015, 2014, 2013, 2012, 2011, 2010, 20…
$ Status <chr> "Developing", "Developing", "Developin…
$ Life.expectancy <dbl> 65.0, 59.9, 59.9, 59.5, 59.2, 58.8, 58…
$ Adult.Mortality <int> 263, 271, 268, 272, 275, 279, 281, 287…
$ infant.deaths <int> 62, 64, 66, 69, 71, 74, 77, 80, 82, 84…
$ Alcohol <dbl> 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.…
$ percentage.expenditure <dbl> 71.279624, 73.523582, 73.219243, 78.18…
$ Hepatitis.B <int> 65, 62, 64, 67, 68, 66, 63, 64, 63, 64…
$ Measles <int> 1154, 492, 430, 2787, 3013, 1989, 2861…
$ BMI <dbl> 19.1, 18.6, 18.1, 17.6, 17.2, 16.7, 16…
$ under.five.deaths <int> 83, 86, 89, 93, 97, 102, 106, 110, 113…
$ Polio <int> 6, 58, 62, 67, 68, 66, 63, 64, 63, 58,…
$ Total.expenditure <dbl> 8.16, 8.18, 8.13, 8.52, 7.87, 9.20, 9.…
$ Diphtheria <int> 65, 62, 64, 67, 68, 66, 63, 64, 63, 58…
$ HIV.AIDS <dbl> 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1…
$ GDP <dbl> 584.25921, 612.69651, 631.74498, 669.9…
$ Population <dbl> 33736494, 327582, 31731688, 3696958, 2…
$ thinness..1.19.years <dbl> 17.2, 17.5, 17.7, 17.9, 18.2, 18.4, 18…
$ thinness.5.9.years <dbl> 17.3, 17.5, 17.7, 18.0, 18.2, 18.4, 18…
$ Income.composition.of.resources <dbl> 0.479, 0.476, 0.470, 0.463, 0.454, 0.4…
$ Schooling <dbl> 10.1, 10.0, 9.9, 9.8, 9.5, 9.2, 8.9, 8…
Data 2
Introduction and data
The source of the data came from the CORGIS Dataset Project. The data was originally collected from the Coffee Quality Database. The datset describes both Arabica and Robusta beans from 2010-2018. The beans are from across many countries and rated on a continuous 0-100 scale. Each observation is a type of coffee bean from a specific farm which include variables such as acidity, sweetness, and fragrance are measured. There are also variables in the dataset that provide information on the origins of the beans such as the altitude of the farm in which the beans are collected from.
Research question
Research question: Does Robusta or Arabica beans have a higher rating?
The research topic explores the rating of the coffee bean. There is a continuous 0-100 rating of the coffee bean on various factors that impact the overall rating of the coffee bean such as acidity, sweetness, fragrance, etc. The research topic will explore the relation between the type of the coffee bean and how it impacts the factors of the overall enjoyability of the coffee bean.
I hypothesize that the Arabica bean will have a higher rating. According to the Department of Agriculture, Arabica beans account for 60% of the coffee beans produced worldwide so there should be a higher quality if it is produced more.
Source: https://usda.library.cornell.edu/usda/fas/tropprod//2010s/2017/tropprod-06-16-2017.pdf
The types of variables in my research question are both categorical and quantitative. For categorical variables, the Country of origin of the coffee bean, whether the bean is Robusta or Arabica, and region in which coffee bean is from. The qualitative variables is altitude of the farm in which the coffee bean is produced, the number of bags of the coffee bean produced, and the scores of the coffee bean for flavor, sweetness, acidity, fragrance, uniformity, and moisture (scale of 1-10 continuous).
Literature
The article I chose was “Arabic vs. Robusta: No Contest” by Jerry Baldwin which was featured in the Atlantic. In the article, Baldwin provides context as to why Arabica beans are better than Robusta beans that are supported by a historical breakdown of the origins of Arabica and Robusta beans. Specifically the creation of Robusta beans were to mix in with the Arabica beans to lower the overall cost of the beans per bag. Additionally, Baldwin provides context that Robusta beans are typically more mass-produced compared to Arabica, which often forgoes the quality and taste of the bean for making more. As a result, Baldwin concludes that Arabica beans have a better taste and quality than Robusta beans. However, Baldwin says that this is a generalization and may vary from bean to bean. Our research question builds on the article by expanding the data to more quantitative data rather than qualitative reasoning based on historical context. Our research question, instead, takes quantitative data of ratings of different factors that impact the quality of the bean (flavor, acidity, sweetness, etc) to see if there is a significant quantitative difference in ratings between Arabic and Robusta beans.
Link: https://www.theatlantic.com/health/archive/2009/06/arabica-vs-robusta-no-contest/19780/
Glimpse of data
<- read.csv("data/coffee.csv")
coffee
glimpse(coffee)
Rows: 989
Columns: 23
$ Location.Country <chr> "United States", "Brazil", "Brazil", "E…
$ Location.Region <chr> "kona", "sul de minas - carmo de minas"…
$ Location.Altitude.Min <int> 0, 12, 12, 0, 0, 0, 1300, 0, 0, 640, 12…
$ Location.Altitude.Max <int> 0, 12, 12, 0, 0, 0, 1400, 0, 0, 1400, 1…
$ Location.Altitude.Average <int> 0, 12, 12, 0, 0, 0, 1350, 0, 0, 1020, 1…
$ Year <int> 2010, 2010, 2010, 2010, 2010, 2010, 201…
$ Data.Owner <chr> "kona pacific farmers cooperative", "ja…
$ Data.Type.Species <chr> "Arabica", "Arabica", "Arabica", "Arabi…
$ Data.Type.Variety <chr> "nan", "Yellow Bourbon", "Yellow Bourbo…
$ Data.Type.Processing.method <chr> "nan", "nan", "nan", "nan", "nan", "nan…
$ Data.Production.Number.of.bags <int> 25, 300, 300, 360, 300, 12, 10, 360, 30…
$ Data.Production.Bag.weight <dbl> 45.35920, 60.00000, 60.00000, 6.00000, …
$ Data.Scores.Aroma <dbl> 8.25, 8.17, 8.42, 7.67, 7.58, 7.50, 7.6…
$ Data.Scores.Flavor <dbl> 8.42, 7.92, 7.92, 8.00, 7.83, 7.92, 7.5…
$ Data.Scores.Aftertaste <dbl> 8.08, 7.92, 8.00, 7.83, 7.58, 7.42, 7.5…
$ Data.Scores.Acidity <dbl> 7.75, 7.75, 7.75, 8.00, 8.00, 7.67, 7.5…
$ Data.Scores.Body <dbl> 7.67, 8.33, 7.92, 7.92, 7.83, 7.83, 7.6…
$ Data.Scores.Balance <dbl> 7.83, 8.00, 8.00, 7.83, 7.50, 7.58, 7.5…
$ Data.Scores.Uniformity <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.0…
$ Data.Scores.Sweetness <dbl> 10.00, 10.00, 10.00, 10.00, 10.00, 10.0…
$ Data.Scores.Moisture <dbl> 0.00, 0.08, 0.01, 0.00, 0.10, 0.01, 0.0…
$ Data.Scores.Total <dbl> 86.25, 86.17, 86.17, 85.08, 83.83, 83.4…
$ Data.Color <chr> "Unknown", "Unknown", "Unknown", "Unkno…
Data 3
Introduction and data
The county_demographics data was taken from the CORGIS Datasets project. This information was collected by the United States Census Bureau about counties in the United States from 2010 through 2019. The observations include county data about age distributions, the education levels, employment statistics, ethnicity percents, household information, income, and other miscellaneous statistics about women-, minority-, and veteran-owned firms with over 3140 observations.
Research question
This research question explores the topic of how attitudes of different age groups manifest in economic development. We hypothesize that since younger people tend to be more supportive of diversity, equity, and inclusion, that counties with a lower median age will show an increase in the amount of women- and minority-owned firms.
The variables in this research question are quantitative, with median age and number of women-/minority-owned firms all taking numerical values.
Literature
Article: https://www.pewresearch.org/social-trends/2023/05/17/diversity-equity-and-inclusion-in-the-workplace/
Summary: The article covers the rise in attention toward diversity, equity, and inclusion in the workplace, and how the opinions of various demographic groups differ toward it. Most workers have experienced DEI measures at work, and most also say these measures have a positive impact, but not many place a lot of importance in it. Women are 11% more likely than men to see DEI as good, and in this order, Black, Hispanic, Asian, and then White people have the most positive opinions of DEI. The greatest indicator by far is political lean — Republicans are equally split between positive and negative opinions, while Democrats overwhelmingly support DEI. In terms of age groups, the general trend is that younger employed adults tend to have greater support of DEI. Workers under 30 show the greatest support — 68%, which generally declines into the 40s-50s in the upper age ranges. Workers generally believe their employer pays the right amount of attention to DEI. About 1/3 of workers believe racial or ethnic diversity and age diversity is important, with slightly less support for gender and sexual orientation diversity. About half of workers also value accessibility. Overall, workers that are women, Black, Democratic or young, tend to show the greatest support toward DEI.
Research Question: Our research question builds on this article, because the article indicates that the younger population supports workplace diversity more, and we want to figure out if the median age of a location is correlated with the minority ownership of businesses. Since younger demographics support diversity more, extrapolating from that, we would expect to see more minority ownership proportions in younger areas.
Glimpse of data
<- read.csv("data/county_demographics.csv")
county_demographics
glimpse(county_demographics)
Rows: 3,139
Columns: 43
$ County <chr> "Abbevill…
$ State <chr> "SC", "LA…
$ Age.Percent.65.and.Older <dbl> 22.4, 15.…
$ Age.Percent.Under.18.Years <dbl> 19.8, 25.…
$ Age.Percent.Under.5.Years <dbl> 4.7, 6.9,…
$ Education.Bachelor.s.Degree.or.Higher <dbl> 15.6, 13.…
$ Education.High.School.or.Higher <dbl> 81.7, 79.…
$ Employment.Nonemployer.Establishments <int> 1416, 453…
$ Ethnicities.American.Indian.and.Alaska.Native.Alone <dbl> 0.3, 0.4,…
$ Ethnicities.Asian.Alone <dbl> 0.4, 0.3,…
$ Ethnicities.Black.Alone <dbl> 27.6, 18.…
$ Ethnicities.Hispanic.or.Latino <dbl> 1.6, 2.8,…
$ Ethnicities.Native.Hawaiian.and.Other.Pacific.Islander.Alone <dbl> 0.0, 0.0,…
$ Ethnicities.Two.or.More.Races <dbl> 1.4, 1.6,…
$ Ethnicities.White.Alone <dbl> 70.2, 79.…
$ Ethnicities.White.Alone..not.Hispanic.or.Latino <dbl> 68.9, 77.…
$ Housing.Homeownership.Rate <dbl> 75.0, 71.…
$ Housing.Households <int> 9660, 222…
$ Housing.Housing.Units <int> 12245, 26…
$ Housing.Median.Value.of.Owner.Occupied.Units <int> 90800, 11…
$ Housing.Persons.per.Household <dbl> 2.46, 2.7…
$ Income.Median.Houseold.Income <int> 38741, 43…
$ Income.Per.Capita.Income <int> 22646, 23…
$ Miscellaneous.Foreign.Born <dbl> 1.8, 1.3,…
$ Miscellaneous.Land.Area <dbl> 490.48, 6…
$ Miscellaneous.Language.Other.than.English.at.Home <dbl> 2.8, 10.0…
$ Miscellaneous.Living.in.Same.House..1.Years <dbl> 90.4, 89.…
$ Miscellaneous.Manufacturers.Shipments <int> 598825, -…
$ Miscellaneous.Mean.Travel.Time.to.Work <dbl> 28.1, 28.…
$ Miscellaneous.Percent.Female <dbl> 51.6, 51.…
$ Miscellaneous.Veterans <int> 1559, 269…
$ Population.2020.Population <int> 24295, 57…
$ Population.2010.Population <int> 25417, 61…
$ Population.Population.per.Square.Mile <dbl> 51.8, 94.…
$ Sales.Accommodation.and.Food.Services.Sales <int> 12507, 52…
$ Sales.Retail.Sales <int> 91371, 60…
$ Employment.Firms.Total <int> 1450, 466…
$ Employment.Firms.Women.Owned <int> 543, 1516…
$ Employment.Firms.Men.Owned <int> 689, 2629…
$ Employment.Firms.Minority.Owned <int> 317, 705,…
$ Employment.Firms.Nonminority.Owned <int> 1080, 373…
$ Employment.Firms.Veteran.Owned <int> 187, 388,…
$ Employment.Firms.Nonveteran.Owned <int> 1211, 400…