top of page

Projects

Yelp Recommendation Engine

Mar 2020 - May 2020

  • Designed Recommendation engine in Python(Hadoop, Pyspark and SparkSQL), that predicts the User rating for any Business(on a scale of 5), and displays the top-rated Businesses, by implementing concepts of User-Based and Item-Based Collaborative Filtering. 

  • Scaled the engine (by tuning hyper-parameters) to work within 300s on Yelp Training Dataset of size over 1.5GB.

  • Implemented a Hybrid Recommendation model, with Pearson Co-relation and Feature extraction using XGBoost, and removed Cold Starts, to obtain a Root Mean Squared Error under 1.0, and accuracy of 98.7% against the Testing dataset.

IJCIS-10-776-g011.png.jpeg

Task manager

May 2020 - Jul 2020

  • Created a Task Manager App using NodeJS, MongoDB, Postman(for serving HTTP requests), and Heroku(for Deployment).

  • Features of the app include New User Registration, User authentication, adding and modifying tasks, image upload, GPS location services using MapBox API, and deadline reminder notifications on User Homepage.

  • Incorporated unit & integration tests to ensure smooth functioning of CRUD operations, and HTPP requests.

to-do-app-ui-design-1.jpg

Cluster Detection

Oct 2019 - Dec 2019

  • Built a social media graph in Pyspark, by connecting Users that have rated similar businesses(by adjusting the similarity threshold).

  • Implemented the Girvan-Neuman algorithm in Python/SparkSQL, to find the betweenness values for 1500 edges.

  • Used these betweenness values to perform hierarchical clustering of the graph into tightly-knit communities, by maximising a Modularity function.

Community-detection-using-the-Girvan-New

SMS Spam Detection

Jan 2020 - Feb 2020

  • Created a Jupiter Notebook to classify a set of 5574 SMS texts into SPAM or HAM, using concepts of Text Mining(TF-IDF), and Naive Bayes.

  • Performed data preprocessing steps, including removal of punctuations and stop words, and converting cleaned data into Count Vectoriser format.

  • Then performed Multinomial Naive Bayes Classification, to separate messages into SPAM or HAM with 98% accuracy.

spam-detection.jpg

Expensify

Jul 2020 - Aug 2020

  • Created a budgeting using React, Hooks, Redux, and Firebase.

  • Features of the app include user authentication via Google Firebase; add, modify, delete expenses; and display expenses sorted by day, month, or year.

  • Added support for images, geo-tagging, and expense charts in the UI.

  • Deployed the app using Heroku.

Saving-Budget-Piggy.gif

Text Search using Apache Solr + PageRank

Sep 2020 - Nov 2020

  • Crawled nytimes.com and used Apache Tika to parse and load the crawled files into an Apache Solr core.

  • Extracted the link structure of the crawled files using Java's JSoup library, and computed PageRank for each page, using Python's NetworkX library.

  • Created a PHP web application to compare the search results for Lucene text search vs PageRank.

  • Used Ajax and JQuery to implement auto-complete, snippets, and spelling correction functionalities for the search queries..

googlealgorithm.jpg
bottom of page