
Projects
Yelp Recommendation Engine
Mar 2020 - May 2020
-
Designed Recommendation engine in Python(Hadoop, Pyspark and SparkSQL), that predicts the User rating for any Business(on a scale of 5), and displays the top-rated Businesses, by implementing concepts of User-Based and Item-Based Collaborative Filtering.
-
Scaled the engine (by tuning hyper-parameters) to work within 300s on Yelp Training Dataset of size over 1.5GB.
-
Implemented a Hybrid Recommendation model, with Pearson Co-relation and Feature extraction using XGBoost, and removed Cold Starts, to obtain a Root Mean Squared Error under 1.0, and accuracy of 98.7% against the Testing dataset.

Task manager
May 2020 - Jul 2020
-
Created a Task Manager App using NodeJS, MongoDB, Postman(for serving HTTP requests), and Heroku(for Deployment).
-
Features of the app include New User Registration, User authentication, adding and modifying tasks, image upload, GPS location services using MapBox API, and deadline reminder notifications on User Homepage.
-
Incorporated unit & integration tests to ensure smooth functioning of CRUD operations, and HTPP requests.

Cluster Detection
Oct 2019 - Dec 2019
-
Built a social media graph in Pyspark, by connecting Users that have rated similar businesses(by adjusting the similarity threshold).
-
Implemented the Girvan-Neuman algorithm in Python/SparkSQL, to find the betweenness values for 1500 edges.
-
Used these betweenness values to perform hierarchical clustering of the graph into tightly-knit communities, by maximising a Modularity function.

SMS Spam Detection
Jan 2020 - Feb 2020
-
Created a Jupiter Notebook to classify a set of 5574 SMS texts into SPAM or HAM, using concepts of Text Mining(TF-IDF), and Naive Bayes.
-
Performed data preprocessing steps, including removal of punctuations and stop words, and converting cleaned data into Count Vectoriser format.
-
Then performed Multinomial Naive Bayes Classification, to separate messages into SPAM or HAM with 98% accuracy.

Expensify
Jul 2020 - Aug 2020
-
Created a budgeting using React, Hooks, Redux, and Firebase.
-
Features of the app include user authentication via Google Firebase; add, modify, delete expenses; and display expenses sorted by day, month, or year.
-
Added support for images, geo-tagging, and expense charts in the UI.
-
Deployed the app using Heroku.

Text Search using Apache Solr + PageRank
Sep 2020 - Nov 2020
-
Crawled nytimes.com and used Apache Tika to parse and load the crawled files into an Apache Solr core.
-
Extracted the link structure of the crawled files using Java's JSoup library, and computed PageRank for each page, using Python's NetworkX library.
-
Created a PHP web application to compare the search results for Lucene text search vs PageRank.
-
Used Ajax and JQuery to implement auto-complete, snippets, and spelling correction functionalities for the search queries..
