Sumedh Khodke



Custom Named-Entity-Recognition using BERT (Python, Pytorch, FastAPI) view project

Designed and developed a Custom NER model using transfer learning on BERT, acts as a resume parser to extract useful information from resumes based on user defined information in fields of Education, Skills, Experience, Location, Interests, etc. The model achieved an accuracy of 74% and was served as an application wrapped using FastAPI

Automatic Speech Recognition using Wav2Vec2 XLSR-53 for low-resource languages (Python, Pytorch, Huggingface) view project

Trained the language model of facebook/wav2vec2-large-xlsr-53 on Marathi language for ASR by fine-tuning on the Open SLR64 Marathi dataset. Achieved a Word Error Rate (WER) on the Test Set of 12.70% with input data sampled at 16kHz. Project was implemented as part of the Huggingface XLSR Open-Source Sprint for low resource languages

NFL RosterGen: Optimizing NFL roster construction using Genetic Algorithms view project

Roster construction and optimization in NFL to create better team construction strategies from a family of genetic algorithms to objectively select players coupled with an ML-based fitness function used to evaluate a team’s quality.

NLP Playground (Python, Pytorch-Lightning, FastAPI, AWS Lambda) view project

Trained and fine-tuned different domain specific Language Models of BERT variants for downstream NLP tasks of Masked and Unmasked Text Prediction and Abstractive Text Summarization. The solution was deployed on a serverless cloud

Prediction and Semantic Classification of Myers-Briggs Personality Types from user provided text (Python, Pytorch) view project

Sponsored final year undergrad research project. Architected a model, given a user written text input, predicts personality type according to its MBTI type with comparative study for its classification techniques for word-vector representations and embeddings. Achieved an accuracy score of 78% on predicted personality type

Extractive Text Summarization using Transformers (Python, Pytorch) view project

Performed extractive text summarization on amazon text reviews on products across multiple categories. Improved on the generalization of model performanaces of BART, T5 and PEGASUS to adapt to out of domain text inputs through fine-tuning.