data projects

  • Taylor Talk

    As Swifties refreshed Ticketmaster for Eras Tour seats, my team combined our love for Taylor Swift’s music with AWS cloud computing. We dived into features of Taylor Swift’s music and posts in her Reddit community, building models that flagged controversial posts and discrimination. Our natural language processing analysis also revealed how different moods and reactions flowed through the community, especially during tour announcements and album releases.

    Language: Python

    Tools: AWS, Amazon SageMaker, PySpark, scikit-learn, NLTK, RegEx, Pandas, and numpy

  • Walkability in Washington, DC

    We explored how walkability correlates with people's lives in Washington, D.C. By analysing walkability data and conversations on Reddit about transportation, we discovered areas where people could easily walk and bike tended to have healthier, more prosperous outcomes. This highlighted a concerning pattern - many lower-income neighbourhoods lacked safe, walkable streets and bike lanes, which connected to poorer health in these communities. I shared these findings at the 2023 Evolution of Data Science Conference.

    Languages: R and Python

    Tools: Altair, Plotly, ggplot2, matplotlib, Seaborn, Pandas, numpy, geopandas, and SciPy

  • Workforce Diversity

    In light of massive layoffs, I was curious about what makes people stay or leave their jobs and obtained a large dataset on six public companies’ workforce from Revelio Labs. Through experimenting with various machine learning approaches - from straightforward Naive Bayes to more complex decision trees that achieved 92.8% accuracy - I uncovered interconnections between diversity, pay, and job satisfaction. One of my most intriguing findings challenged conventional wisdom: higher-paid employees were more likely to seek new opportunities elsewhere.

    Languages: R and Python

    Tools: Pandas, numpy, and scikit-learn

  • Assessing Food Additive Safety with FDA

    For my graduate Capstone project, I worked with the Office of Food Additive Safety within the Food and Drug Administration (FDA) to assess food safety. I used Natural Language Processing and Machine Learning techniques to analyse 30,000 scientific papers, identifying and extracting patterns between food products, pathogens, and adverse events. The FDA's Office of Food Additive Safety incorporated our findings into their work to spot potential food safety issues before they could become problems.

    Language: Python

    Tools: Pandas, NTLK, spaCy, and texacy