My journey as a Data Scientist started in 2005, when I embarked on a project to predict story sequences in transcribed audio text. I was a graduate research assistant at that time. Since then, I have developed and deployed multiple machine learning, text mining and NLP pipelines that have stood the test of time. I am now a Data Scientist with GitHub, leading several projects relating to source code analysis and intelligent software security models. In the recent past, I helped launch the first ML + NLP pipeline at GitHub with the release of GitHub Topics, which enables auto-suggestion of topics for millions of repositories as well as suggestions for related topics exploration.
I received my Ph.D. in Computer Science with a focus on Text Mining, Machine Learning and Search from the University of Illinois at Urbana Champaign. My advisor was Dr. ChengXiang Zhai. My experience in the area of Text Analytics includes sentiment analysis, topic extraction, text summarization, content based recommendations, entity linking, intelligent web crawling and search. I have also worked on projects relating to text segmentation and log analysis, such as software repository commit logs and search logs. My approach in solving all these different problems has always been to keep models and algorithms understandable and thus explainable, scalable and usable in practice.
I have authored over ten first author papers at top tier Data Mining and NLP venues such as WWW, COLING, NAACL, IEEE Big Data and Information Retrieval Journal and am a co-inventor on multiple industry patents. I also periodically serve as program committee and task chairs for top AI and NLP conferences including ACL, AAAI, NAACL, ECIR, CIKM and AMIA. A list of my publications and patents can also be found in Google Scholar. As I get time, I share text-mining and related articles with the hope that it would be useful to other researchers and engineers in the field.
To get in touch with me, connect with me on LinkedIn or email me at email@example.com.