Wednesday, May 23, 2018

Basic Panda's functions for dataset manipulation in Python


Let we have two tables
Data1.schema:
    name:
    Company:
    Address:
Data2.schema:
    Name:
    gender:
    Age:

Commands:


# delete a column

del Data1[“name”]

# Return all the rows of the ‘name’ column

data1.loc[:, ‘name’]

# Return all the rows from ‘name’ column to ‘Company’ column

data1.loc[:, ‘name’:’Company’]

# Return all the rows from ‘Company’ column to ‘Address’ column

data1.loc[:, ‘Company’:’Address’]

# Return the rows whose Address is “India”

data1.loc[lambda var: var.Address == “India”, : ]
or
data1.loc[data1[“Address”] == “India”, :]

# Return only the name whose Address is “India”

data1.loc[lambda var: var.Address == “India”, ’name’]

# Return only the first five names whose Address is “India”

data1.loc[lambda var: var.Address == “India”, ’name’].head(5)

# Return only the last five names whose Address is “India”

data1.loc[lambda var: var.Address == “India”, ’name’].tail(5)

# Return the list of unique Addresses

data1.Address.unique()

# Return the list of unique Companies

data1.Company.unique()

# Return the list of unique names

data1.name.unique()

# Return the first unique Address

data1.Address.unique()[0]

# Return the total number of unique Addresses

data1.Address.unique().size

# Return the name, age and gender from data2 if the Address is India

t = data1.loc[data1[“Address”] == “India”, “name”].unique()
for x in range(0,t.size)
    data[2].loc[data2[“name”]==t[x],:]

No comments:

Post a Comment