Oceans and Rivers:

The Global Garbage Cans



An analysis of the status of SDG 6:
Clean Water and Sanitation in the Philippines.



We are the Water Boys and we’re diving deep into data to surface solutions for sustainable water in the Philippines.


Here's an overview of our project.


Our project is an analysis of the status of SDG 6: Clean Water and Sanitation in the Philippines. The Water Boys aims to use its Data Science knowledge to analyze water quality across the different regions of the Philippines to find new insights and share it with the country.

Motivation

The Philippines depends on water as a source of life, livelihood, and sanitation, but as Jacques Cousteau, the co-inventor of the first scuba set, once said “Water and air, the two essential fluids on which all life depends, have become global garbage cans.” This is why we decided to focus on SDG 6, as all life depends on clean water.

Problem

Access to clean water and sanitation is a critical issue in the Philippines, with many regions facing challenges related to water quality, availability, and sanitation facilities.

Solution

Utilize data science techniques to analyze the status of SDG 6: Clean Water and Sanitation in the Philippines, uncover insights about water quality, sanitation practices, and their implications for public health and environmental sustainability.


Water has always been an integral part of the Philippines as it is surrounded by abundant water resources.

These bodies of water are essential in sustaining life, ecosystem, and the livelihood of many. However, despite the ample water resources, the Philippines is still facing water stress or lack of clean and accessible water[1]. The quality of these water resources has long been under threat by various natural phenomena and human activities especially with the rapid increase in population, urbanization, and industrialization[2]. Moreover, the Philippines ranks among the world's top contributors to water pollution, which poses great threats on human health, agriculture, fisheries, and tourism[3]. This study strives to deepen our understanding of the various pollutants in the Philippines’ bodies of water, ultimately paving the way for improved water preservation.

BACKGROUND



This led us to ask two questions.



1. How do water quality parameters vary across different regions and water bodies in the Philippines?


Null Hypothesis


There is no significant difference in water quality parameters across different regions and water bodies in the Philippines.

Alternative Hypothesis


pH levels, dissolved oxygen concentrations, turbidity, etc., vary considerably depending on geographic location, water use practices, and pollution sources.


2. How have the water quality parameters for each waterbody changed from 2019 to 2021?


Null Hypothesis


There is no significant change in the water quality parameters in the various water bodies from 2019 to 2021.

Alternative Hypothesis


There is a significant change in the water quality parameters in the various water bodies from 2019 to 2021.

So what was the plan of action?


1. Data Collection: Gather data on water quality parameters from various sources. Collect information on parameters such as BOD, chloride, color, DO, fecal coliform, nitrate, PH, phosphate, temperature, and TSS.

2. Data Analysis: Analyze the collected data to understand the distribution and trends of water quality parameters across different regions and water bodies in the Philippines. Use statistical methods to assess variations and identify factors influencing water quality.

3. Interpretation: Interpret the findings to determine the implications of water quality parameters variations for environmental health and public safety. Discuss the potential impact on ecosystems, public health, and economic activities dependent on clean water sources.

4. Recommendation: Provide recommendations for water resource management, pollution control measures, and policy interventions aimed at addressing identified water quality issues and safeguarding environmental health and public safety.

5. Presentation: Present the findings and recommendations in a clear and accessible manner through reports, presentations, or visualizations. Communicate the importance of addressing water quality challenges to achieve SDG 6 targets and promote sustainable development in the Philippines.



How did we collect our data?




We found three data sources from DENR - Environmental Management Bureau.



Each dataset monitoring the ambient water quality of different freshwater bodies in the Philippines. The data from the sources were from the years 2019 to 2021, and from these sources, only the water bodies with records available in all years were considered when building our dataset.



View our dataset here View our data collection document here

Methodology



Our methodology encompasses a systematic approach to collecting and analyzing data. We employ rigorous statistical techniques to test our hypotheses, ensuring the reliability and validity of our findings. This process includes detailed data pre-processing procedures, analytical methods, and tools used to interpret the results within the context of our research objectives.




1. Pre-processing


Data preprocessing is a crucial step in our methodology, involving the cleaning and transformation of raw data into a format suitable for analysis. This phase ensures accuracy by addressing missing values, outliers, and inconsistencies, and by normalizing and scaling the data to prepare it for subsequent statistical testing.


See more

a. Dataset

The data from the sources were from the years 2019 to 2021, and from these sources using Purposive Sampling, only the water bodies with records available in all years were considered when building our dataset. The data was then cleaned by fixing blank cells as follows:

  • If the waterbody's parameter values for all three years were blank, set all of the yearly values to 0 (with regards to the parameter).
  • If the cell has no parameter for one or two years, get the average of the waterbody's parameter value(s) that exist and set the blank to the average.

b. Research Question 1

The water quality parameters values were normalized since we’re dealing with variables having different units or scales. By doing this, we can ensure that each parameter contributes equally to the analysis and that no single parameter dominates due to its scale. The formula used for the normalized value is: (parameter value - mean) / standard deviation.


c. Research Question 2

The water quality parameters values were transformed since we’re dealing with variables having outliers. By doing this, we can ensure that each parameter contributes equally to the analysis and that no single parameter dominates due to its extremeness. We used NumPy’s log transformation function (np.log1p()) and SciKit-Learn’s MinMax Scaling function (MinMaxScaler()) to do this.



2. Hypothesis Testing


Hypothesis Testing assesses if sample data reflects a true effect or is due to chance, guiding decisions with statistical evidence. The process involves setting up a null hypothesis that there is no effect or difference, and an alternative hypothesis that there is an effect or difference. Statistical tests are then performed to determine whether the observed data is significantly different from what would be expected under the null hypothesis. Upon consultation, we only tested the null hypothesis of research question 2.


Hypothesis testing document


See more


a. Statistical Test: Multivariate Analysis of Variance (MANOVA)

Note: We aim to find the difference in water quality parameters (dependent variables) in various water bodies from 2019 to 2021 (independent variables). Research includes multiple dependent variables that are potentially correlated, hence, we use MANOVA instead of repeated ANOVA. t. Using multiple ANOVAs would increase the risk of a Type I error (a significant finding which occurs by chance due to repeating the same test a number of times).


Reference


b. CSV File

Note: The raw dataset used for the statistical testing includes the following columns: Body of Water, BOD 2019, BOD 2020, BOD 2021, … , TSS 2019, TSS 2020, TSS 2021. This was eventually preprocessed into long format with the following columns: Body of Water, Year, BOD, CHLORIDE, COLOR, DO, FECAL COLI, NITRATE, PH, PHOSPHATE, TEMP, TSS.


Code without log transformation

Code with log transformation


Here are the test results.

Hypothesis



We reject the Null Hypothesis, supporting the alternative hypothesis that there are significant changes in water quality parameters over the years 2019-2021.


We looked for statistically significant p-values (Pr > F) in each section. A low p-value (below level of significance 0.05) indicates that the variable or interaction term has a significant effect on the dependent variables. From the results, we found that Wilks' lambda is close to 0, Pillai's trace is close to 1, and Hotelling-Lawley trace and Roy's greatest root has large F values which supports our hypothesis that there is a significant change in the water quality parameters in the various water bodies from 2019 to 2021.



(Press see more to see the visualization and detailed interpretations)


See more


Without log transformation

With log transformation


  • Intercept: This row represents the overall intercept of the model. The statistics include Wilks' lambda, Pillai's trace, Hotelling-Lawley trace, and Roy's greatest root, which are different ways to measure the effect of the independent variables on the dependent variables.
  • Wilks' lambda: demonstrates the amount of variance accounted for in the dependent variable by the independent variable; the smaller the value, the larger the difference between the groups being analyzed.
  • Pillai's trace: another statistic measuring the effect of on the dependent variables. It ranges from 0 to 1 and a higher value indicates a significant effect.
  • Hotelling-Lawley trace: is used when the independent variable forms two groups and represents the most significant linear combination of the dependent variables. Reject the null hypothesis if this test statistic is large.
  • Roy's greatest root: is calculated in a similar fashion to Pillai's trace except it only considers the largest eigenvalue (i.e. the largest loading onto a vector). Increasing values for the statistic indicate increasing contributions by effects to the model in question.
  • Body_of_Water: Results specific to the variable “Body_of_Water”
  • Year: Results specific to the variable “Year”
  • Body_of_Water:Year: Represents interaction of “Body_of_Water” and “Year” and how the combination of these two variables influences the dependent variables.


Without log transformation

  • Year: Significant (p < 0.05)
  • Body_of_Water: Mixed significance, with Roy's Greatest Root significant
  • Body_of_Water:Year: Mixed significance, with Roy's Greatest Root significant

With log transformation

  • Year: Significant (p < 0.05)
  • Body_of_Water: Mixed significance, with Roy's Greatest Root significant
  • Body_of_Water:Year: Mixed significance, with Roy's Greatest Root significant



Research Question 1



How do water quality parameters vary across different regions and water bodies in the Philippines?



Interpretation:


Parameters like pH, Temperature, and Dissolved Oxygen show more uniform distributions, indicating less variation across regions. Parameters such as TSS, Nitrate, Fecal Coliform, Chloride, Color, BOD, and Phosphate show significant variability, highlighting areas with potential pollution concerns.



(Press see more to see the visualization and detailed interpretations)


See more


Parameters


pH and Temperature: These parameters are relatively stable across regions, with most values close to the mean. A few regions show slightly higher or lower values, but there are no extreme outliers.

Dissolved Oxygen (DO): Regions with blue cells have lower-than-average DO, which can indicate decreasing oxygenation and poor water quality. Dissolved oxygen (DO) concentrations are normally sufficient to maintain healthy biotic assemblages in unpolluted, free-flowing streams, but low or extremely high DO levels can impair or kill fishes and invertebrates.

Total Suspended Solids (TSS): Higher TSS levels (red cells) are observed in some regions, indicating higher sediment matter. Lower TSS levels (blue cells) suggest clearer water. When it comes to water quality, high TSS may decrease water's natural dissolved oxygen levels and increase water temperature.

Nitrate: Variability in nitrate levels is seen, with some regions showing high levels (red cells), which can indicate agricultural runoff and urban pollution. High nitrate levels are a concern for eutrophication and drinking water safety.

Fecal Coliform: Elevated fecal coliform levels (red cells) indicate potential contamination from different bacteria, posing health risks. Lower levels (blue cells) indicate cleaner water.

Chloride: High chloride levels are typically associated with having high or elevated total dissolved solids. These can be observed in urban and industrial regions, reflecting pollution from road salt, industrial discharge, and sewage.

Color: Highly colored water has significant effects on aquatic plants and algal growth. Light is very critical for the growth of aquatic plants and colored water can limit the penetration of light.

Biochemical Oxygen Demand (BOD): High BOD levels (red cells) are observed in regions with significant organic pollution, reducing oxygen availability for aquatic life. Lower BOD levels (blue cells) indicate less organic pollution.

Phosphate: Excessive phosphorus in surface water can cause explosive growth of aquatic plants and algae. This can lead to a variety of water-quality problems.

Urban Regions (NCR, Region 3, Region IV-A, Region 11, Region 12): Higher levels of pollutants like chloride, fecal coliform, color, BOD, and phosphate, reflecting urban runoff, industrial discharge, and sewage.

Type of Graph: Cluster map


Color Scale: Warmer colors represent higher normalized values of the water quality parameters while cooler colors represent lower values.

Rows and Columns: Each row represents a region-waterbody combination, while each column represents a water quality parameter.

Preprocessing: The water quality parameters values were normalized since we’re dealing with variables having different units or scales. By doing this, we can ensure that each parameter contributes equally to the analysis and that no single parameter dominates due to its scale. The formula used for the normalized value is: (parameter value - mean) / standard deviation.

Code: clustermap.py
CSV File: water_quality.csv
Graph: water_quality_cluster_map.png



Research Question 2



How have the water quality parameters for each waterbody changed from 2019 to 2021?



Interpretation:


Parameters such as pH, Dissolved Oxygen (DO), Chloride, Color, and Biochemical Oxygen Demand are observed to be increasing through the years. Regions 3, 4A, 11, 12 and 13 have relatively higher levels of the 10 parameters. This signifies possible pollution and low water quality. The only parameter that showed a decrease in levels for three years was temperature (in Region 4A).



(Press see more to see the visualization and detailed interpretations)


See more


Parameters


pH: There is visible increase in pH values within the three years. Almost every waterbody obtained higher levels in 2020 and 2021. Although aquatic animals prefer higher pH levels, an extreme increase in pH levels can still negatively affect the biodiversity.

Temperature: These parameters are relatively stable across regions. Only Region 4A in particular has visible changes in coloring (darker to lighter). This means that temperatures in the said region have been decreasing.

Dissolved Oxygen (DO): Most of the regions see an increase in Dissolved Oxygen through time, especially waterbodies in rural areas. This can both be good and bad. As mentioned in the primary research question’s interpretation, low or extremely high DO levels can impair or kill fishes and invertebrates.

Total Suspended Solids (TSS): Most of the cells show stability with little changes in their colors. On the other hand, the TSS levels in waterbodies in Mindanao can be observed as relatively higher. Higher TSS may decrease water's natural dissolved oxygen levels and increase water temperature.

Nitrate: There is stability in almost all of the waterbodies. However, there are outliers with extremely high levels, potentially signifying urban pollution.

Fecal Coliform: Fecal Coliform levels are seen as stable. There are some rivers with high values, which indicates contamination.

Chloride: High, and increasing levels of chloride can be observed from waterbodies in Regions 3 and 4A.

Color: There is not much change in color levels in these waterbodies. There are some with increasing values.

Biochemical Oxygen Demand (BOD): Regions 3 and 4A show high BOD levels. These are regions with significant organic pollution, reducing oxygen availability for aquatic life. On the other hand, most of these values are stable.

Phosphate: Most values are low and stable throughout the years except for waterbodies in Regions 3 and 4A. Excessive phosphorus in surface water can cause explosive growth of aquatic plants and algae. This can lead to a variety of water-quality problems.

NOTE: This is only one out of the 10 graphs that was made to compare each of the changes in water quality parameters within three years. Please click here to view every graph.

Type of Graph: Heat map


Color Scale: Darker colors represent higher values of the water quality parameters (dirtier water) while lighter colors represent lower values (cleaner water), except for the parameter Dissolved Oxygen (DO), and pH.

Rows and Columns: Each row represents a region-waterbody combination, while each column represents a year from 2019 to 2021.

Preprocessing: The water quality parameters values were transformed since we’re dealing with variables having outliers. By doing this, we can ensure that each parameter contributes equally to the analysis and that no single parameter dominates due to its extremeness. We used NumPy’s log transformation function (np.log1p()) and SciKit-Learn’s MinMax Scaling function (MinMaxScaler()) to do this.

Code: rq_2_heat_maps.ipynby
CSV File: waterquality.csv
Graph: rq2_graphs



NutShell Graph


Flow of Progress: Water Quality Enhancements in Major Philippine Rivers (2019-2021)


How clean were some of the most impactful bodies of water from 2019 to 2021?




(document link)



The graph depicts the change in water quality from 2019 to 2021 for ten major rivers in the Philippines, chosen based on their priority and impact, based on previous research. The Y-axis represents the years, while the color gradient shows water quality: darker green indicates poorer quality, and lighter blue indicates better quality. Water quality was calculated as the mean of the 'passes' for all parameters using the formula: number of passes / total number of parameters.

Key observations include:




Overall, the graph shows a general improvement in water quality over this period.




We represented 10 of the most impactful out of the 90+ water bodies. The impact was based on these sources: [Source 1] [Source 2]


Gradient color/value represents how ‘clean’ a body of water is based on the number of ‘passing’ water quality parameters per waterbody. A value of 1.0 is the brightest and represents the ‘cleanest’ overall. This value is calculated by taking the sum of all passing parameters and dividing by the total number of parameters.


Formula:

Color Scale:

Darker and greener colors represent ‘dirtier’ water with a lower number of ‘passing’ parameters, while lighter and bluer colors represent ‘cleaner’ water with a higher number of ‘passing’ parameters.


To better understand, let's discuss!



The results of the exploratory data analysis (EDA) support our hypotheses that there has been a significant change in the water quality parameters across various water bodies from 2019 to 2021. Our nutshell plot also shows a general improvement in terms of water quality in rivers around the Philippines during this period. This upward trend in water quality in our rivers holds great promise for the future of water quality and this suggests that efforts to minimize waste output into rivers have been somewhat effective. Moreover, the findings of our EDA could reflect on future clean-up initiatives and policy implementations to effectively address problems in water quality, availability, and sanitation, especially in rural areas.


First research question: How do water quality parameters vary across different regions and water bodies in the Philippines?

Our analysis shows that while some parameters like pH, temperature, and dissolved oxygen (DO) are relatively uniform across regions, others such as total suspended solids (TSS), nitrate, fecal coliform, chloride, color, biochemical oxygen demand (BOD), and phosphate exhibit significant variability especially in Urban Regions (NCR, Region 3, Region IV-A, Region 11, Region 12). This variability highlights regions that are more vulnerable to pollution. For instance, higher levels of pollutants like chloride, fecal coliform, color, BOD, and phosphate indicate urban runoff, industrial discharge, and sewage, pointing to pollution sources that need to be addressed. This information is crucial for future interventions such that clean-up organizations and policymakers would be able to target pollution sources more effectively.


Secondary research question: How have the water quality parameters for each waterbody changed from 2019 to 2021?

The temporal analysis of water quality parameters from 2019 to 2021 reveals notable trends such as a steady increase in most parameters (pH, Dissolved Oxygen (DO), Chloride, Color, and Biochemical Oxygen Demand) while the only parameter that showed a decrease in levels for three years was temperature. However, certain regions (Regions 3, 4A, 11, 12, and 13) exhibit relatively higher levels of these parameters, indicating persistent pollution issues. This could help in identifying which region and waterbody needs to be prioritized the most depending on the trends of their respective water quality parameters.


Socio-Economic Impacts

Our findings could have profound impacts for the socio-economic landscape of the Philippines. Having better insights on water pollution countermeasures and improved water quality can positively impact livelihoods, public health, and biodiversity.

  • Livelihood (Fishing, Tourism, Economy): Cleaner bodies of water support better fishing conditions and enhance tourism, contributing positively to the economy.
  • Public Health (Sanitation, Access to Clean Water): Cleaner water reduces waterborne diseases, improving public health and reducing healthcare costs.
  • Biodiversity (Ecosystem): Improved water quality sustains aquatic life and biodiversity, supporting ecosystem health and resilience.


Limitations: Our data analysis was also constrained by factors in data availability, domain knowledge, and analysis technique.

  • Data Availability: We employed purposive sampling to gather data that includes our desired variables. Purposive sampling in our research context introduces potential biases such as generalizability and sampling frame bias. Additionally, Our data was sourced from government websites, which had gaps that we addressed during preprocessing (e.g., calculating averages), potentially introducing statistical noise that affected the overall accuracy. External factors such as the pandemic and unknown data collection methods by our sources introduce uncertainties that could have influenced their data, thereby impacting our analysis.
  • Domain Knowledge: We lack expertise in the field of water quality and sanitation, resulting in an incomplete understanding of water quality parameters and their interactions. This deficiency affects our analysis and interpretation.
  • Analysis Technique: We employed MANOVA as our statistical technique for hypothesis testing because we aimed to explore potential correlations among multiple variables. Consequently, the results would be complex and interpreting them can be challenging due to its multivariate nature. MANOVA is also sensitive to outliers and missing data. We addressed outliers by employing log transformations and handled missing data by either computing averages or removing those cases entirely. Overall, our analysis technique is subject to several limitations that are heavily dependent on the organization and cleanliness of our data, which in turn, affects the accuracy of the results.

Moving forward, to mitigate the impact of these challenges, we plan to implement several strategies. Firstly, to address our lack of domain knowledge, we will collaborate with experts in water quality and sanitation to gain deeper insights into relevant parameters and their interrelationships. Furthermore, we intend to explore advanced analysis techniques, including machine learning algorithms, to better understand and interpret the data, taking into account potential biases in data collection. By adopting these approaches, we aim to improve the accuracy of our future analyses.



In conclusion,



There is a significant change in the water quality parameters in the various water bodies from 2019 to 2021. Our nutshell plot shows this general improvement in terms of water quality in rivers around the Philippines from 2019 to 2021.


The variability of parameters such as total suspended solids (TSS), nitrate, fecal coliform, chloride, color, biochemical oxygen demand (BOD), and phosphate highlights regions with potential pollution concerns. Regions 3, 4A, 11, 12, and 13 exhibit relatively higher levels of these parameters, indicating persistent pollution issues.








How is our study important and relevant to the current status of our rivers and other water bodies?


So, what shall we Filipinos do now?






Together, let us utilize the insights from this study to take proactive steps towards ensuring clean water for present and future generations!


Contact Us

Yanni Ella

I’m Yanni, a 2nd-year Computer Science undergraduate student at the University of the Philippines Diliman. I have a strong passion for learning new things and thrive on challenges and competition. My enthusiasm for exploring innovative solutions and my dedication for continuous learning drive me to pursue computer science. I’m excited to apply make more meaningful contributions to the field.

I enjoy playing competitive games such as Valorant and League of Legends. My competitive spirit extends beyond gaming- I also love playing sports and travel adventures. To unwind, I enjoy binge-watching movies and series.

Email: ycella@up.edu.ph
Facebook: fb.com/YanniJoseElla

Jensen Rabatan

Hello! I'm Jensen and I'm a 2nd-year Computer Science student at the University of the Philippines Diliman. I’m fueled by instant coffee and I have a passion for gaming. I even started programming so I could start developing my own games.

I constantly try to improve my skills, tackling new challenges everyday. This extends to gaming as well. I find immense satisfaction in playing difficult games like Dark Souls and Hollow Knight because of how rewarding overcoming their challenges can be.

Email: jmrabatan@up.edu.ph
Github: @JRabatan
Facebook: Jensen Rabatan

Dean Ramirez

I'm Dean, a 4th-year Computer Science student at the University of the Philippines Diliman. I have a knack for solving small personal problems using applications that I built myself. This involves the use of data science, research, and software/web development.

I enjoy arts and music as much as I enjoy programming and solving problems on my own. I also love playing games and writing poetry. Naturally, my hobby is finding more hobbies to try!

Email: dpramirez@up.edu.ph
Github: @ansem7
Facebook: Dean Ramirez

Nadinne Sumulong

Hi! I’m Nadinne, a 4th-year Computer Science student at the University of the Philippines Diliman, currently in my final semester! Right now, I’m busy preparing for the job hunting season and finishing my thesis. Exciting times ahead!

I’m passionate about leadership and volunteering, both within and beyond the university. I thrive in event organization and love being part of the core team—it energizes me and gives me a sense of purpose. Whether it's leading a project or lending a hand, I’m always eager to contribute and make a difference!

Email: njsumulong@up.edu.ph
Github: @sumulongnj
Facebook: Nadinne Sumulong



© 2024 Water Boys