My Python Cheat Sheet

These are some python code snippets that I use very often.

Topics include: list manipulation, json manipulation, data frame and etc.

Python List Manipulation

Concatenate two python lists

listone = [1,2,3]
listtwo = [4,5,6]

joinedlist = listone + listtwo

Convert a python string to a list of characters

word = 'abc'
the_list = list(word)

JSON Manipulation

Convert a dictionary to a json string

import json

r = {'expert': 'True', 'rating': 1.5}
r = json.dumps(r)

Convert a json string back to a python dictionary

import json

my_dict = json.loads(json_string)

Load a json file into a pandas data frame

import pandas as pd

#this assumes one json item per line in json file
df=pd.read_json("path_to_json_file", lines=True)

Pandas DataFrame Manipulation

Group by a column and keep the column afterwards

df.groupby(['column_name']).aggregate_function().reset_index()

Convert a data frame to a list of dictionary values

Let’s say you want a list of dictionaries from a pandas data frame as follows:

From this:

To this:

[{"name":"rita","age":23}, {"name":"gita","age":45}]

This is the code you would use:

 dict_vals=df.to_dict(orient='records')

Convert a dictionary to a pandas data frame

Let’s say you have a dict as follows:

my_dict={'mrr':0.4,'map':0.3,'precision':0.6}.

To convert this to a pandas Data Frame, you can do the following:

import pandas as pd

my_dict={'mrr':0.4,'map':0.3,'precision':0.6}
pd.DataFrame(list(my_dict.items()),columns=['metric','value'])

You will see the following output:

   metric     value
0   mrr         0.4
1   map         0.3
2   precision   0.6

Select rows matching a specific column criteria

Let’s say you want to find rows where the column value matches a specific constraint. You could use the following:

import pandas as pd
df=df[(df['column1']<=2) & df['column2']<==3) ]

Create a new data frame column with specific values

Let’s say you want to add an additional column to a data frame with values generated via some external processing. You can transform the external values into a list and do the following:

vals=[1,2,3,4]

df['vals']=vals

Sort data frame by value

# Sort in descending order
df.sort_values(by=["column_name1","column_name2"],ascending=False)

Get unique values from a data frame column

# Get unique list of values in the df['column_name'] column
uniq_vals=list(df.column_name.unique())

Create a new derived column with df.apply

The goal here is to create a new column with values populated based on the values of an old column. Let’s say you want a new column that adds 1 to a value from an old column.

# Generate a new value from `an_existing_column`
# generate_a_value(x) is a python function that generates a value 
# based on the column value from `an_existing_column`

df['my_new_column'] = df['an_existing_column'].apply(lambda x: generate_a_value(x))
 

If you want to send more than two columns for processing:

# Generate a new value from two or more existing columns

df['my_new_column'] = df.apply(lambda x: generate_a_value(x.column1,x.column2),axis=1)
 

Select /display specific columns from a data frame

# select specific columns in a data frame
df= df[['my_col1','my_col2']]

System Commands

Run a system command from within Python code

import os

os.system("your_system_command")

File / Directory Operations

Safely create nested directories in Python

import os

#check if path exists, if not create the directory
if not os.path.exists(directory):
    os.makedirs(directory)

Evaluation

Compute per-class precision, recall, f1 scores

The goal here is to compute per-class precision, recall and f1 scores and display the results using a data frame.

The first step is to collect your labels as two separate lists. (1) the predicted labels and (2) the corresponding true labels. For example:

predicted_labels=["positive","positive","negative"]
true_labels=["positive","other","negative"]

Once you have the true and predicted labels in a list, you can use sklearn’s `precision_recall_fscore_support` module to compute all the scores for you. Here’s how you do it:

from sklearn.metrics import precision_recall_fscore_support

# the possible labels
labels = ['positive', 'negative', 'other']

# setting average to None, returns precision, recall and f1 scores for individual labels            

per_class_prf=precision_recall_fscore_support(true_labels,predicted_labels,average=None,labels=labels)

# Collect all the p/r/f labels for the 3 classes
precisions = per_class_prf[0]
recalls = per_class_prf[1]
fscores = per_class_prf[2]
supports = per_class_prf[3]

# Zip the values to make rows for each label
table_data = zip(labels, precisions, recalls, fscores, supports)

# Place the zipped values in a dataframe
df = pd.DataFrame(list(table_data),columns=['labels', 'precision', 'recall', 'f-score', 'num_of_examples'])
print(df.sort_values(by=['num_of_examples'], ascending=False))

Example output:

     labels  precision    recall   f-score  num_of_examples
1  negative   0.875000  0.933333  0.903226               15
0  positive   0.636364  0.777778  0.700000                9
2     other   0.500000  0.200000  0.285714                5