How To Filter Pandas Dataframe

Question

I am creating a groupby object from a Pandas DataFrame and want to select out all the groups with > 1 size.

Example:

                                  A  B 0  foo  0 1  bar  1 two  foo  2 3  foo  3

The post-obit doesn't seem to work:

                grouped = df.groupby('A') grouped[grouped.size > ane]

Expected Issue:

                A foo 0     two     3

Hopefully some help: grouped.size().apply(lambda ten: 10>1), just I'chiliad not sure how to do this — Oct 31, 2012 at 21:44
this is interesting ..for a change I have hit a area where a feature needed by me is missing in Pandas ..for long it was my understanding of it that was missing ..keen library for what I do.. — October 31, 2012 at 21:51

SealanderSealander 2,949 3 gilded badges xviii silvery badges 19 bronze badges · Answer 1 · 2018-08-02 16:47:15Z

I have plant transform to be much more efficient than filter for very big dataframes:

                  element_group_sizes = df['A'].groupby(df['A']).transform('size') df[element_group_sizes>1]

Or, in one line:

                  df[df['A'].groupby(df['A']).transform('size')>1]

answered Aug 2, 2018 at 16:47

SealanderSealander

2,949 3 gilded badges xviii silvery badges 19 bronze badges

3

Great answer! Only yous paid attending to efficiency. Bravo!

Apr iv, 2019 at ane:54
Shouldn't be element_group_sizes = df['A'].groupby('A')['A'].transform('size') instead?

May 24, 2019 at 12:05
@IgorFobia no -- df['A'] volition be a Series and will no longer have a column 'A'. I suppose element_group_sizes = df.groupby('A')['A'].transform('size') would work though.

May 24, 2019 at 15:08

elyaseelyase 36.7k xi gold badges 101 silvery badges 114 statuary badges · Answer 2 · 2013-08-15 21:13:15Z

As of pandas 0.12 you can do:

                  >>> grouped.filter(lambda 10: len(ten) > 1)       A  B 0  foo  0 ii  foo  2 3  foo  iii

answered Aug 15, 2013 at 21:13

elyaseelyase

36.7k xi gold badges 101 silvery badges 114 statuary badges

v

What is the 'x' in this instance? Does that refer to the cavalcade which y'all used to groupby?

Oct 17, 2013 at 23:45
x would exist each subgroup of the groupby functioning, which you tin examine with grouped.groups. In case of a multicolumn groupby these subgroups refer to several columns, but this is irrelevant equally len counts by the rows in pandas objects.

Oct 18, 2013 at eight:45
Is there a way to get GroupBy object after filter, not a DataFrame? The only way I see now is to call groupby again, merely this seems inefficient

Oct 27, 2015 at 15:56
@IvanVirabyan Worse, with categorical values the empty groups pop upwards again.

Mar 27, 2018 at 18:32
grouped.filter(lambda 10: len(x.alphabetize) > 1) should exist slightly faster

May 24, 2019 at x:37

Mykola ZotkoMykola Zotko 11.8k 2 gold badges 36 argent badges 52 bronze badges · Answer 3 · 2021-02-13 18:07:31Z

Yous can employ the method filter and the belongings shape:

                df.groupby('A').filter(lambda x: 10.shape[0] > 1)

Chang SheChang She 15.7k 8 aureate badges 39 silver badges 24 bronze badges · Answer 4 · 2012-11-01 17:00:54Z

If you all the same need a workaround:

                  In [49]: pd.concat([group for _, group in grouped if len(grouping) > ane]) Out[49]:       A  B 0  foo  0 2  foo  ii 3  foo  iii

answered Nov 1, 2012 at 17:00

Chang SheChang She

15.7k 8 aureate badges 39 silver badges 24 bronze badges

2

:Thank you : thats what I had implemented at present but information technology would be nice to know how to do filtering on grouped objects coz that would exist independent of writing a new list comprehension for each custom filtering case.

November i, 2012 at 17:57
The result #919 cited in a higher place would be a expert solution once someone implements it

Nov 9, 2012 at 20:59

4 Answers 4