Python Cheat Sheet for Data Science Practitioners

These are some python code snippets that I use very often.

Pandas DataFrame Manipulation

Ensure a data frame column is a float type column

df['column_name'] = df['column_name'].astype(float)

Group by a column and keep the column afterwards


Convert a list of lists into a pandas data frame

df=pd.DataFrame(list(your_list_of_lists),columns=['column 1','column 2','column3'])

Convert a data frame to a list of dictionary values

Let’s say you want a list of dictionaries from a pandas data frame as follows:

From this:

To this:

[{"name":"rita","age":23}, {"name":"gita","age":45}]

This is the code you would use:


Convert a dictionary to a pandas data frame

Let’s say you have a dict as follows:


To convert this to a pandas Data Frame, you can do the following:

import pandas as pd

You will see the following output:

   metric     value
0   mrr         0.4
1   map         0.3
2   precision   0.6

Select rows matching a specific column criteria

Let’s say you want to find rows where the column value matches a specific constraint. You could use the following:

import pandas as pd
df=df[(df['column1']<=2) & df['column2']<==3) ]

Create a new data frame column with specific values

Let’s say you want to add an additional column to a data frame with values generated via some external processing. You can transform the external values into a list and do the following:


Sort data frame by value

# Sort in descending order

Get unique values from a data frame column

# Get unique list of values in the df['column_name'] column

Create a new derived column with df.apply

The goal here is to create a new column with values populated based on the values of an old column. Let’s say you want a new column that adds 1 to a value from an old column.

# Generate a new value from `an_existing_column`
# generate_a_value(x) is a python function that generates a value 
# based on the column value from `an_existing_column`
df['my_new_column'] = df['an_existing_column'].apply(lambda x: generate_a_value(x))

If you want to send more than two columns for processing:

# Generate a new value from two or more existing columns
df['my_new_column'] = df.apply(lambda x: generate_a_value(x.column1,x.column2),axis=1)

Select/display specific columns from a data frame

# select specific columns in a data frame
df= df[['my_col1','my_col2']]

Ensure a data frame column is a float type column

df['column_name'] = df['column_name'].astype(float)

Python List Manipulation

Concatenate two python lists

listone = [1,2,3]
listtwo = [4,5,6]
joinedlist = listone + listtwo

Convert a python string to a list of characters

word = 'abc'
the_list = list(word)

Randomize contents of python list

import random
the_list = ["item1", "item2", "item3"]

JSON Manipulation

Convert a dictionary to a json string

import json
r = {'expert': 'True', 'rating': 1.5}
r = json.dumps(r)

Convert a json string back to a python dictionary

import json
my_dict = json.loads(json_string)

Load a json file into a pandas data frame

import pandas as pd
#this assumes one json item per line in json file
df=pd.read_json("path_to_json_file", lines=True)

System Commands

Run a system command from within Python code

import os

File / Directory Operations

Safely create nested directories in Python

import os
#check if path exists, if not create the directory
if not os.path.exists(directory):


Compute per-class precision, recall, f1 scores

The goal here is to compute per-class precision, recall and f1 scores and display the results using a data frame.

The first step is to collect your labels as two separate lists. (1) the predicted labels and (2) the corresponding true labels. For example:


Once you have the true and predicted labels in a list, you can use sklearn’s `precision_recall_fscore_support` module to compute all the scores for you. Here’s how you do it:

from sklearn.metrics import precision_recall_fscore_support
# the possible labels
labels = ['positive', 'negative', 'other']
# setting average to None, returns precision, recall and f1 scores for individual labels            
# Collect all the p/r/f labels for the 3 classes
precisions = per_class_prf[0]
recalls = per_class_prf[1]
fscores = per_class_prf[2]
supports = per_class_prf[3]
# Zip the values to make rows for each label
table_data = zip(labels, precisions, recalls, fscores, supports)
# Place the zipped values in a dataframe
df = pd.DataFrame(list(table_data),columns=['labels', 'precision', 'recall', 'f-score', 'num_of_examples'])
print(df.sort_values(by=['num_of_examples'], ascending=False))

Example output:

     labels  precision    recall   f-score  num_of_examples
1  negative   0.875000  0.933333  0.903226               15
0  positive   0.636364  0.777778  0.700000                9
2     other   0.500000  0.200000  0.285714                5