Excel Query - Find duplicates & delete non unique data

Excel Query - Find duplicates & delete non unique data

Author
Discussion

Muntu

Original Poster:

7,650 posts

205 months

Friday 18th August 2023
quotequote all
Hi

I have a spreadsheet with a few thousand IP addresses in column A, a list of the owning department in column B, and a list of the owners in column C.

Some IP addresses are owned by more than one department. These departments may have the same owner, or different owners

IP Address Department Owner
10.0.0.1 Sales Joe Bloggs
10.0.0.1 Production Joe Bloggs
10.0.0.1 Engineering Joe Bloggs
10.200.88.99 Sales Adam Ant
10.200.88.99 Sales Cindy Lating
10.200.88.99 Engineering Arthur Crown
10.200.88.99 Production Adam Ant
10.200.88.99 Production Arthur Crown


Where one IP has a single owner across a number of departments, (see 10.0.0.1 above) I need to delete two of the three rows

Where one IP has multiple different owners, some duplicated across departments, some not, I need to keep one instance for each owner, and delete the rest, so on 10.200.88.99 I would delete one row each for Adam and Arthur

So far I've conditionally formatted column A for duplicates, filtered individual IP, then manually deleted what isn't required.
20 minutes into this and I'm already sick of it. It's not the best use of my time - is there / what is a better way to do this?
Thanks!

biggiles

1,817 posts

231 months

Friday 18th August 2023
quotequote all
From what you've said... don't filter. Sort by IP. Then kill the rows you don't want. (Even by marking an X in column D, then later filtering and killing all rows with an X).

Random Account No6

5,046 posts

192 months

Friday 18th August 2023
quotequote all
You might be able to do it with a pivot table and putting a distinct count of the Dept in the value.

That would at least thin out a chunk of manual work, I think, although it is Friday.
I work loads better when I have data to cock about with in a spreadsheet smile

CobolMan

1,420 posts

213 months

Friday 18th August 2023
quotequote all
On the Data menu there's a remove duplicates option. Create a new tab, copy the data across, sort by columns A and C then, keeping all 3 columns selected, select remove duplicates using columns A and C.

imck

809 posts

113 months

Friday 18th August 2023
quotequote all
Pivot
Name and then IP in rows
Field Settings on Name and set to tabular in layout & print so IP in same row as Name

Only falls over if you have two people with the same name

colin79666

1,936 posts

119 months

Friday 18th August 2023
quotequote all
Another option would be create a column D that takes the contents of A and C together (=$A1+$C1) then use that to remove duplicates. Repeat as needed with column B in the picture.

Edited by colin79666 on Friday 18th August 20:31

Muntu

Original Poster:

7,650 posts

205 months

Saturday 19th August 2023
quotequote all
Thanks all for the input! Looks like I have a few options, I'll start with Colin79666's as it looks the simplest for me given my limited Excel knowledge

I really appreciate the responses, virtual pints all round!

Thank you beer

Hill92

4,466 posts

196 months

Saturday 19th August 2023
quotequote all
colin79666 said:
Another option would be create a column D that takes the contents of A and C together (=$A1+$C1) then use that to remove duplicates. Repeat as needed with column B in the picture.

Edited by colin79666 on Friday 18th August 20:31
Can combine with conditional formatting to highlight duplicates and then filter based on colour. Problem is it picks up the duplicates too.

If you sort the concatenated helper (btw formula should probably be =A1&C1, not =A1+C1) column first, you can do =d2=d1 in column E from row 2 onwards. This will identify if the concatenated value in Column D matches the line above and return TRUE/FALSE. You can then filter for FALSE to identify unique values.

Depending on your version of Excel the new =UNIQUE() formula may work

https://help.chi.ac.uk/excel-unique-function

But the most straightforward idea is imck's pivot table option.

iwantagta

1,323 posts

151 months

Saturday 19th August 2023
quotequote all
colin79666 said:
Another option would be create a column D that takes the contents of A and C together (=$A1+$C1) then use that to remove duplicates. Repeat as needed with column B in the picture.

Edited by colin79666 on Friday 18th August 20:31
This starts as i would. But not sure if the duplicates delete would work easily - thus i would:

I would then sort by D
then in column E - Formula in cell E3 of IF(D3=D2,1,0) - copy down.
Copy D&E paste special values.
Data sort by column E.
Sort by E
Delete all the rows with a 1.

Muntu

Original Poster:

7,650 posts

205 months

Saturday 19th August 2023
quotequote all
This is awesome, thanks for the additional info

Looks like my spreadsheet Sunday is going to be a lot shorted than I thought it would be when I finished work on Friday! smile

beer

Whoozit

3,750 posts

275 months

Saturday 19th August 2023
quotequote all
iwantagta said:
I would then sort by D
then in column E - Formula in cell E3 of IF(D3=D2,1,0) - copy down.
Copy D&E paste special values.
Data sort by column E.
Sort by E
Delete all the rows with a 1.
This. It'll take less time than anything else. For a one-off, it does the trick. I've used it many times in 30+ years to clear up dupes.

bigpriest

1,723 posts

136 months

Saturday 19th August 2023
quotequote all
If your data is in one column you can:

Data > Sort

Then:

Data > Filter > Advanced Filter
Check "Copy to Another Location"
In the Copy To box (click a free column on your sheet)
Tick "Unique records only"

OK