These are some python code snippets that I use very often.
Topics include: list manipulation, json manipulation, data frame and etc.
Contents
Python List Manipulation
Concatenate two python lists
listone = [1,2,3] listtwo = [4,5,6] joinedlist = listone + listtwo
Convert a python string to a list of characters
word = 'abc' the_list = list(word)
JSON Manipulation
Convert a dictionary to a json string
import json r = {'expert': 'True', 'rating': 1.5} r = json.dumps(r)
Convert a json string back to a python dictionary
import json my_dict = json.loads(json_string)
Load a json file into a pandas data frame
import pandas as pd #this assumes one json item per line in json file df=pd.read_json("path_to_json_file", lines=True)
Pandas DataFrame Manipulation
Group by a column and keep the column afterwards
df.groupby(['column_name']).aggregate_function().reset_index()
Convert a data frame to a list of dictionary values
Let’s say you want a list of dictionaries from a pandas data frame as follows:
From this:
To this:
[{"name":"rita","age":23}, {"name":"gita","age":45}]
This is the code you would use:
dict_vals=df.to_dict(orient='records')
Convert a dictionary to a pandas data frame
Let’s say you have a dict as follows:
my_dict={'mrr':0.4,'map':0.3,'precision':0.6}
.
To convert this to a pandas Data Frame, you can do the following:
import pandas as pd my_dict={'mrr':0.4,'map':0.3,'precision':0.6} pd.DataFrame(list(my_dict.items()),columns=['metric','value'])
You will see the following output:
metric value 0 mrr 0.4 1 map 0.3 2 precision 0.6
Select rows matching a specific column criteria
Let’s say you want to find rows where the column value matches a specific constraint. You could use the following:
import pandas as pd df=df[(df['column1']<=2) & df['column2']<==3) ]
Create a new data frame column with specific values
Let’s say you want to add an additional column to a data frame with values generated via some external processing. You can transform the external values into a list and do the following:
vals=[1,2,3,4] df['vals']=vals
Sort data frame by value
# Sort in descending order df.sort_values(by=["column_name1","column_name2"],ascending=False)
Get unique values from a data frame column
# Get unique list of values in the df['column_name'] column uniq_vals=list(df.column_name.unique())
Create a new derived column with df.apply
The goal here is to create a new column with values populated based on the values of an old column. Let’s say you want a new column that adds 1 to a value from an old column.
# Generate a new value from `an_existing_column` # generate_a_value(x) is a python function that generates a value # based on the column value from `an_existing_column` df['my_new_column'] = df['an_existing_column'].apply(lambda x: generate_a_value(x))
If you want to send more than two columns for processing:
# Generate a new value from two or more existing columns df['my_new_column'] = df.apply(lambda x: generate_a_value(x.column1,x.column2),axis=1)
Select /display specific columns from a data frame
# select specific columns in a data frame df= df[['my_col1','my_col2']]
System Commands
Run a system command from within Python code
import os os.system("your_system_command")
File / Directory Operations
Safely create nested directories in Python
import os #check if path exists, if not create the directory if not os.path.exists(directory): os.makedirs(directory)
Evaluation
Compute per-class precision, recall, f1 scores
The goal here is to compute per-class precision, recall and f1 scores and display the results using a data frame.
The first step is to collect your labels as two separate lists. (1) the predicted labels and (2) the corresponding true labels. For example:
predicted_labels=["positive","positive","negative"] true_labels=["positive","other","negative"]
Once you have the true and predicted labels in a list, you can use sklearn’s `precision_recall_fscore_support` module to compute all the scores for you. Here’s how you do it:
from sklearn.metrics import precision_recall_fscore_support # the possible labels labels = ['positive', 'negative', 'other'] # setting average to None, returns precision, recall and f1 scores for individual labels per_class_prf=precision_recall_fscore_support(true_labels,predicted_labels,average=None,labels=labels) # Collect all the p/r/f labels for the 3 classes precisions = per_class_prf[0] recalls = per_class_prf[1] fscores = per_class_prf[2] supports = per_class_prf[3] # Zip the values to make rows for each label table_data = zip(labels, precisions, recalls, fscores, supports) # Place the zipped values in a dataframe df = pd.DataFrame(list(table_data),columns=['labels', 'precision', 'recall', 'f-score', 'num_of_examples']) print(df.sort_values(by=['num_of_examples'], ascending=False))
Example output:
labels precision recall f-score num_of_examples 1 negative 0.875000 0.933333 0.903226 15 0 positive 0.636364 0.777778 0.700000 9 2 other 0.500000 0.200000 0.285714 5