Stack Overflow Dataset

This data set contains 20,000 stack overflow questions in json. The dataset contains the following 19 attributes.

Reading the dataset from python is really simple:


import pandas as pd
 
# read json into a dataframe
df=pd.read_json("data/stackoverflow-data-idf.json",lines=True)
 
# print schema
print("Schema:\n\n",df.dtypes)
print("Number of questions,columns=",df.shape)
 

Links

  • Stack Overflow Dataset in Json
  • Keyword extraction using this dataset