At present, I am a Machine Learning Data Scientist with GitHub, leading projects that are focused on driving intelligence into everyday developers’ workflow. I helped launch the first ML/NLP pipeline at GitHub with the release of GitHub Topics, which enables auto-suggestion of topics for millions of repositories. Over the last decade, I have worked on various data problems including topic extraction, sentiment analysis, text summarization, content based recommendations, and search. I take pride in coming up with solutions that not only show promising results but also scale to large amounts of data.
I received my Ph.D. in Computer Science with a focus on Text Mining, Machine Learning, NLP and Search from the University of Illinois at Urbana Champaign. My advisor was Dr. ChengXiang Zhai. As I get time, I write articles related to text analytics and related topics.
I have authored over 10 first author papers at top tier venues such as WWW, COLING, IEEE Big Data and Information Retrieval Journal and am a co-inventor on multiple industry patents. I also periodically serve as program committee and task chairs for top tier conferences including ACL, NAACL, ECIR, CIKM and AMIA. A list of my publications and patents can also be found in Google Scholar.