Skip to main content
df.groupby() - How to Aggregate Data

Pandas df groupby With Examples

Pandas groupby is used for grouping the data according to the categories and applying a function to the categories. It also helps to aggregate data efficiently.

It returns a GroupBy object, which you can then apply aggregation functions to, such as mean, sum, count, etc.

Pandas dataframe.groupby() function is used to split the data into groups based on some criteria. The pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

you can use groupby() with a combination of sum(), pivot(), transform(), aggregate(), etc methods. I can also use numpy.sum, numpy.mean, numpy.max, etc. as the aggregating function, the order of the isolated array that is grouped doesn’t matter.

Python Panda

Pandas is one of those packages and makes importing and analyzing data much easier.

pandas df.groupby() Example

Let me explain with an example. I have data in df as follows:

Id Month Team Point

001 Jan a 12
… … … ….
009 Mar c 13

I can use a group by to aggregate the data over each team:

Syntax

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)

Parameters :

by: mapping, function, str, or iterable
axis: int, default 0
level: Group by a particular level or levels
as_index: For aggregated output, if as_index=False is effectively “SQL-style” grouped output
sort: Sort group keys. Get better performance by turning this off.
group_keys: Add group keys to index to identify pieces.
squeeze: Reduce the dimensionality of the return type if possible, otherwise return a consistent type

Returns: GroupBy object

Df.groupby with Example

Pandas DataFrame.groupby() function is used to collect identical data into groups and perform aggregate functions on the grouped data. Let’s create a python script that aggregates data by team name on the above CSV data.

importing pandas as pd

import pandas as pd

Creating the dataframe

df = pd.read_csv("teams_data.csv")
gk = df.groupby('Team')

Print the dataframe

print(gk)

There are numerous versions of the extremely effective function groupby(). It makes the process of dividing the dataframe across a few criteria incredibly simple and effective.

Sort group key in descending order

We can sort data in ascending and descending order.

groupedDF = df.groupby('Team',sort=False)
sortedDF=groupedDF.sort_values('Team', ascending=False)
print(sortedDF)

groupby() to compute the sum

The sum method is used to calculate the sum of any column value after df.groupby().

df2 =df.groupby(['Courses']).sum()
print(df2)

df group By count

You can also group by several columns at once as follows:

df.groupby(['revenue','session','user_id'])['user_id'].count()

Set Index on groupby results

We can also reset the index by reset_index method.

df2 = df.groupby('Points').sum().reset_index()
print(df2)

Conclusion

I have covered DataFrame.groupby() syntax with examples of how to group your data. I hope you have learned how to run group by on several columns, sort grouped data, ignore null values, and many more with examples.

Leave a Reply

Your email address will not be published. Required fields are marked *