DSCI 310 Data in Action S25 – Bryn Mawr College

Zoe Beer (HC ‘26), Ferida Mohammed (BMC ‘26), Kripa Lamichhane (BMC ‘26)

Discovery Center: Bird Data

Semester: Spring 2025

Praxis Course: DSCI 310: Data in Action

Faculty Advisor: Jennifer Spohrer

Field Site: The Discovery Center

Field Supervisor: Bria Wimberly

Praxis Poster:

DSCI_PraxisPoster_KripaLamichhane_ZoeBeer_FeridaMohammed

Further Context:

This semester, our team collaborated with the Philadelphia Discovery Center to analyze their bird observation data. Situated in Philadelphia’s Fairmount Park, the Discovery Center emerged from a collaboration between the National Audubon Society and the Philadelphia Outward Bound School. A century-old abandoned reservoir was transformed into a unique wildlife sanctuary and vital stopover for over 100 bird species migrating along the Atlantic Flyway. Since opening in 2018, the Discovery Center provides a space for Philadelphians to discover themselves in nature, practice leadership, and work toward a greener city. Audubon Mid-Atlantic uses the Discovery Center as a facility for research and science-based conservation initiatives and educational programs throughout the Philadelphia region. The Center protects a unique habitat rarely found in a major urban area and provides programs that build community across Philadelphia. The Discovery Center fosters community engagement through bird-watching and environmental stewardship.

Our team’s objective was to support the Audubon Mid-Atlantic’s mission to conserve and restore Pennsylvania’s natural ecosystems, benefiting biological diversity. Early in our project, during our weekly check-in meetings with our field supervisor, Bria Wimberly, we identified two primary but underused data sources on birds that the Center pulled from. First, the Center had been manually archiving data on physical paper tally sheets using a checklist system where visitors could mark their bird observations. This complicated data storage and analysis. Second, the Center uses the eBird.com website which contains valuable digital data observations on birds seen at the East Park Reservoir location. However, the information is not clearly visualized and does not fully present the data in an understandable manner to individuals outside the birding community. Our team worked to address these challenges by developing more efficient data collection strategies and exploring new visualization techniques.

Considering both long term implications and time constraints, our team divided tasks, set realistic milestones, and defined tangible deliverables. Our initial projects started broader in scope and were then streamlined into smaller, targeted projects aligned with each member’s data analytical strengths. Throughout the semester, we maintained a larger purpose as we made our deliverables and met our goals: render bird data at the Discovery Center more accessible, understandable and engaging for the local community and visitors, enhancing their interactive experience with data and nature.

When it came to data visualization for our bird observation project, we prioritized creating the simplest and clearest analytical representation possible. Initially, we experimented in RStudio and with Plotly Express instead of the more common Matplotlib and Pandas packages, as Plotly offered superior interactive mapping capabilities essential for geographic data. Our first approach displayed observation counts according to bird names and time period, which had notable advantages. Observers could identify how frequently specific birds were spotted without needing taxonomic knowledge. However, this method created problems: the resulting map was cluttered and difficult to interpret without hovering over data points. Additionally, the scale disparity between rare sightings (1 observation) and common birds (up to 3,000 observations) meant data points for uncommon birds virtually disappeared on the map. To address these issues, we pivoted to grouping birds according to The Discovery Center’s standardized taxonomic categories. This significantly improved readability while reducing visual clutter. We preserved detailed information by programming hover functionality that displays specific bird names and observation counts within each category when users interact with data points. Adding distinct color coding for different categories enhanced visual differentiation and intuitive understanding.

We then faced the challenge of making the visualization accessible to users with minimal coding experience. We implemented a dropdown feature that allows users to select any year they wish to visualize, making the interface more user-friendly and eliminating redundant code.

For distribution, we initially considered Google Colab but recognized limitations for non-technical users who would need to understand code execution. Instead, we created a website hosted on GitHub Pages, similar to an interactive visualization encountered in another data science class. This approach makes our visualization accessible without requiring coding knowledge. One current limitation is the complex interactive elements require a larger screen for optimal viewing, making mobile access challenging. However, we’re currently refining the code to make the website responsive, with plans to at least provide a static version for mobile users in the future.

We believe our work lays a foundation for future bird data visualization and analysis at the Discovery Center. While the current graphs rely on static, locally collected data, future iterations could integrate the eBird API to automate data collection and allow for periodic updates. Visualizations can also be refined to focus on specific species, offering more targeted insights that could help the Discovery Center create environments better suited to the needs of those birds. The Google Form we developed provides a starting point for a digital approach to recording and archiving monthly bird sightings, making long-term data management more efficient and opening the door for more dynamic visualizations. We hope the Discovery Center shares these visualizations with the public to gather feedback, which can guide future improvements and encourage greater community engagement with local bird populations.

Through this project, we developed new technical skills and deepened our understanding of data visualization and analysis. We strengthened our RStudio abilities by working with new packages and creating clear, insightful visualizations tailored to complex ecological data. We also learned to clean and filter large datasets using Python, and explored different types of visualizations using libraries like Matplotlib and Plotly Express—gaining insight into which tools and features (like hover effects and interactivity) work best for different types of data. Beyond the technical aspects, we learned the importance of flexibility, iterative testing, and thoughtful design choices when presenting data in a way that highlights key trends and supports meaningful interpretation.

Cynthia Chen (BMC ‘25), Maika Kogawara (BMC ’27), Nayja Shah (BMC ‘27)

Centralizing Data Collection

Semester: Spring 2025

Praxis Course: DSCI 310: Data in Action

Faculty Advisor: Jennifer Spohrer

Field Site: Harriton House

Field Supervisor: Laura Carpenter

Praxis Poster:

DSCI_Harriton House Praxis Poster_Revised

Further Context:

For this project, we collaborated with Harriton House, a historic house near the college that housed many influential figures, including Charles Thompson. The Harriton Association maintains it and was seeking access to funding so that it could retain its independent operations. To help Harriton House receive funding, we worked on centralizing their data collection systems, generating headcount forms, and creating guides for staff for easy maintenance.

We primarily used Excel forms, OneDrive, and Excel sheets as our tools for data collection and storage. To gather information from roamers, we created an Excel form with a QR code containing questions about demographics and group sizes. The QR code provides a convenient and accessible way to collect headcounts digitally, requiring only a few taps on a phone.

One major issue Harriton House faced was finding an efficient way to house their volunteer check-in and check-out system. Prior to this project, the Harriton staff had been using paper copies to track volunteer hours, which was not ideal for students in the nearby area to demonstrate their volunteer participation for credits and fund proposals.

We decided to explore Microsoft Excel as the macros function was perfect for our goal of creating an easy, simple, and quick method for volunteers to track their hours. Microsoft Excel is also free on the App Store for volunteers to download and check in and out from their devices. Through trial and error and adapting from codes we found on YouTube channels such as Barb Henderson, we were able to generate a fully automated Excel sheet that can track what time volunteers clocked in and out, and the total hours they worked throughout the week. We implemented volunteer ID numbers as a way for volunteers to clock in and out swiftly. These volunteer ID numbers are unique 4-digit numbers that Harriton staff can assign based on the volunteer’s birthdate. Our hopes for this Excel sheet is for it to relieve the workload off of Harriton staff and have quantitative data on hand for fund proposals. But this spreadsheet only works for regular volunteers. For event-specific volunteers, we created another Excel form for volunteer logins during specific events. Volunteers must record their start and end times on this form.

This project helped cultivate community connections with the Harriton staff, taught the importance of patience, and allowed for the practice of data governance. This praxis, Data in Action, focuses on a variety of topics, one of them being data governance. How do we ensure that the data is being stored safely, not being taken advantage of, and can be reproduced? To implement data governance, we worked to ensure that users consented to their data being used, the information was reproducible, and not identifiable. We informed the people filling out our headcounting survey about why we were asking for this information and asked for their permission to use their information for head counting. Then, in order to ensure reproducibility, we created guidelines, specifically for the volunteer check-in and check-out information. We created instructions on how to clean the dataset and conduct analyses for staff, as well as instructions for the volunteers who were going to fill out the form. The third aspect was fulfilled by creating four-digit ID numbers to identify data rather than using the personal information of people who filled out the forms, preventing re-identifiability.

This project aimed to help digitize the Harriton House’s data and to make it easier for them to fulfill data aggregation to receive funding. We are excited to see the staff use our materials and are hopeful that the Harriton Association will, in due course, receive funding for them to continue operating independently.

Ruth Tilahun (BMC 26′), Kelli Eng (BMC 26′), Jenny Le (BMC 26′), Gioanna Zhao (BMC 26′), David Dai (HC 26′)

Automating Data Collection and Analysis for Solar Energy Initiatives

Semester: Spring 2025

Praxis Course: DSCI 310: Data in Action

Faculty Advisor: Jennifer Spohrer

Field Site: Philadelphia Solar Energy Association (PSEA)

Field Supervisor: Liz Robinson, Rob Celentano

Praxis Poster:

DSCI_Jenny_Le_RevisedPraxisPoster

Further Context:

During our time in the Data in Action course, we gained the opportunity to explore a crucial question in terms of data and social impact: What does it take to use data responsibly in service of social good? Over the semester, we explored the legal, ethical, and historical dimensions of data use, while partnering directly with local non-profits to co-create a data project that reflected their values, needs, and mission. We learned to critically examine how data is produced and interpreted, and how thoughtful design and communication can make data more useful. Through hands-on work, we gained insight into both the power and the responsibility that comes with using data in the public sphere.

Our team partnered with the Philadelphia Solar Energy Association (PSEA), a non-profit that promotes solar energy adoption across Pennsylvania through advocacy, education, and community engagement. PSEA’s main challenge was related to data collection and visualization. Solar installation data was scattered across different platforms, inconsistently formatted, and difficult to update. This limited their ability to create timely, effective visual materials to inform the public and support clean energy initiatives. The goal with our project was to streamline the data collection and visualization process that was often compiled by one person. We developed a sustainable, code-based process to gather, clean, and visualize solar data from public sources like AEPS, SEIA, and PJM. Using Python, we created scripts that automated data extraction and analysis, providing a final deliverable of user-friendly, updatable plots delivered in a Jupyter Notebook format. Each team member contributed to the project in a unique way. One member focused on scraping and organizing the data, experimenting with different Python libraries to handle inconsistent formats and shifting web structures. Another led the visualization efforts, creating clear and interpretable charts like histograms, bar graphs, and bubble plots to illustrate trends in solar adoption. Other teammates documented the full workflow and assembled the project deliverables, ensuring our work would be easy for PSEA to maintain long-term. Throughout the semester, we met biweekly with PSEA staff to present our progress and adapt our approach based on their needs. By the end, we had a working system that helped streamline their outreach efforts and gave us a real sense of what it means to do data work that matters.

One experience that stood out during this project was the pivot in our final project deliverable format. In the beginning, our team members aimed to create a product that would require no work on the back-end from PSEA. This manifested in the use of an API that would run visualizations based on our Python scripts and deploy them to a separate website. In discussing with our supervisors, we decided that this format would ultimately not serve PSEA’s goals, so we pivoted to Jupyter Notebook. Initially, this felt like a setback because the scripts then required some efforts from PSEA to download external data sources. However, in troubleshooting this issue, we wrote documentation for the data import process. Our final deliverable decreases the overall workload for PSEA, if not being 100% hands-off, and this experience gave us deeper insight into how crucial it is to openly communicate with partners about technical limitations and updates.

This course and partnership with PSEA allows us to gain technical skills as well as tools for data analysis, collaboration, and project design. We learned how data can shine a light on possibilities for community advocacy, and we’ll take with us the ability to communicate our work clearly while handling data responsibly.

Lucy Cambefort (BMC 25′), Angie Quiroz (BMC 26′), Fiona Shen (BMC 27′)

Data Visualization for Reservoir Concentration

Semester: Spring 2025

Praxis Course: DSCI 310: Data in Action

Faculty Advisor: Jennifer Spohrer

Field Site: Discovery Center

Field Supervisor: Bria Wimberly

Praxis Poster:

DSCI_WaterProject_Revised_PraxisPoster-compressed

Further Context:

This semester, our team collaborated with The Discovery Center to create a visualization model demonstrating the evolution of abiotic factors over time, using data collected from Lake Vickers on the Bryn Mawr campus, provided by Professor Tom Mozdzer. The reservoir at The Discovery Center spans 38 acres and is just under 8 feet deep. Structurally, it resembles a bathtub, with steep walls leading down to a flat bottom composed of concrete and brick. Originally, water was pumped in from the Schuylkill River, but it is now primarily replenished through precipitation. It is currently the largest body of freshwater in Philadelphia.

By collecting and publishing data on the water chemistry of the reservoir, the community can learn more about the biodiversity and in turn improve the environmental conditions of the center. This will also make Professor Mozdzer’s data available for the college to further research on the sustainability of the campus. To make this work, we reproduced the graphs from Zentra Cloud in R, and created a StoryMap using KnightLab to make the data accessible and informational to the public.

We were able to use the data continuously uploaded in real time to the ZentraCloud platform through the sensors installed in Lake Vickers by Tom Mozdzer. Months worth of information on the environment and chemistry of the water stored on Zentra proved to be an invaluable source we could work with to create our model. However, Zentra is only accessible through obtaining the credentials of an account, which is expensive and in turn creates difficulties for allowing more people to use the data.

We took on the added challenge of recreating a graph from Zentra using R Studio, which is public and free, that would serve as a model for making this data accessible to Bryn Mawr. This would also allow for the data to be used in other classes and further the research started by Professor Mozdzer. We were able to write out the code for plotting graphs using csv files, which would be the outline for the Discovery Center to use once they collect their own data from the reservoir using sensors they intend on installing. This process pushed our coding skills and our ability to make use of the resources provided by the college, such as the office hours held by the Digital Humanities department.

In order to display the graphs created with R Studio, we searched for a platform that allowed The Discovery Center to present monthly water chemistry data from different locations in the reservoir and ultimately embed it onto their website. We found StoryMap, a user-friendly tool from KnightLab that allows users to add descriptions and graphs to various locations using coordinates which are important to graph points in the reservoir. This would be a great way to incorporate advanced graphs using R to customize visualizations. While StoryMap focuses on spatial data, StoryLine is another tool used to display information uploaded from spreadsheets to create interactive graphs. This tool also has the option of including descriptions for specific data points, allowing the public to understand changes in levels of dissolved oxygen, pH and temperatures over time, as well as other chemistry data.

Our next steps for the Discovery Center is to present the code we have been working on and our data visualizations to the staff in a virtual presentation in May. We hope for them to use our visualization model as an example and apply it to their own water data. In the long road, they would produce code that updates their visualizations regularly with real time data and embed the StoryMap onto their website. This would allow for communities and the public to view their water data and gain insights on water chemistry like temperature, DO, pH, etc at the center, while also including biotic factors as well like aquatic insects. This will help their goals in measuring how those populations vary over time in different locations, seeing what is affecting them.