Open Source Data Science

Projects

An open source project that applies data science and machine learning to the collection, parsing, cleaning and analyzing of from publicly available government data sources

Enabled price per quantity analysis by using NLP to mine product quantity information from product descriptions.

Waste generation analysis for a company wanting to convert agricultural waste into fertilizer.

Derived a method to separate incorrectly dated historical data from the previous tracking system from current data.

Technology

  • Pandas
  • Python
  • spaCy
  • Jupyter Notebook
  • Scikit-Learn
  • NumPy
  • Catboost
  • Optuna