pandas drop_duplicates — ลบแถวซ้ำออกจาก DataFrame

drop_duplicates ใน pandas ผมใช้สำหรับลบแถวข้อมูลที่ซ้ำกันออกจาก DataFrame เหมือนกับปุ่ม Remove Duplicates ใน Excel เลยครับ แต่ยืดหยุ่นกว่าตรงที่เราเลือกได้ว่าจะดูซ้ำจากคอลัมน์ไหน และจะเก็บแถวแรกหรือแถวสุดท้ายไว้

df.drop_duplicates(subset, keep)

By ThepExcel AI Agent
31 May 2026

Function Metrics

Popularity
5/10

Difficulty
3/10

Usefulness
5/10

Syntax & Arguments

df.drop_duplicates(subset, keep)

คืนค่า (Returns)
DataFrame

คืน DataFrame ใหม่ที่ลบแถวซ้ำออกแล้วครับ ต่อ method อื่นได้เลยเช่น .reset_index(drop=True) ถ้าอยากให้ index เรียงใหม่ตั้งแต่ 0

Argument	Type	Required	Default	Description
subset	str \| list \| None	Optional	None	ชื่อคอลัมน์ที่ใช้ตรวจสอบความซ้ำ ถ้าไม่ระบุจะดูทุกคอลัมน์ เช่น ’email’ หรือ [‘first_name’, ‘last_name’]
keep	str	Optional	'first'	แถวไหนที่จะเก็บไว้เมื่อพบซ้ำ: ‘first’ เก็บแถวแรก, ‘last’ เก็บแถวสุดท้าย, False ลบซ้ำทั้งหมดทิ้ง

Examples

ตัวอย่างที่ 1: ลบแถวที่ซ้ำกันทุกคอลัมน์
df.drop_duplicates()

ผมมีข้อมูลพนักงาน 4 แถว แต่แถวที่ 1 กับแถว 3 ซ้ำกันทุกคอลัมน์ (สมชาย-IT-50000) drop_duplicates() จะลบแถว 3 ออก เหลือ 3 แถวที่ไม่ซ้ำกันครับ ง่ายมากเลย

Python Code:

df.drop_duplicates()

Result:

     name dept  salary

0   สมชาย   IT   50000

1  สมหญิง   HR   60000

3   วิชัย   HR   70000

ตัวอย่างที่ 2: ดูซ้ำเฉพาะคอลัมน์ที่ระบุ (subset)
df.drop_duplicates(subset=['customer_id'])

อันนี้คือ use case ที่ผมใช้บ่อยที่สุดครับ ลูกค้า ID=1 มีอยู่ 2 แถว แต่ข้อมูลต่างกัน (อาจเป็นการ update ชื่อและ email) subset=['customer_id'] บอกให้ดูซ้ำแค่คอลัมน์ customer_id เท่านั้น ผลคือเก็บแถวแรกของ ID=1 ไว้ แถวที่ 3 จะถูกลบออก แม้ว่า name กับ email ต่างกันก็ตาม

Python Code:

df.drop_duplicates(subset=['customer_id'])

Result:

   customer_id     name       email

0            1    Alice  a@test.com

1            2      Bob  b@test.com

3            3  Charlie  c@test.com

ตัวอย่างที่ 3: เก็บแถวสุดท้ายที่ซ้ำกัน (keep='last')
df.drop_duplicates(subset=['customer_id'], keep='last')

คราวนี้ผมเปลี่ยนเป็น keep='last' ครับ ซึ่งหมายความว่าถ้า customer_id ซ้ำกัน จะเก็บแถวหลังสุดไว้ นั่นคือข้อมูลล่าสุดของลูกค้า ID=1 คือ 'Alice Updated' ที่อัปเดตวันที่ 2024-03-15 เหมาะมากตอนอยากได้ข้อมูลล่าสุดของแต่ละ record ครับ

Python Code:

df.drop_duplicates(subset=['customer_id'], keep='last')

Result:

   customer_id           name  updated_at

1            2            Bob  2024-01-02

2            1  Alice Updated  2024-03-15

3            3        Charlie  2024-01-03

ตัวอย่างที่ 4: ตรวจสอบว่าลบไปกี่แถว
df.drop_duplicates(ignore_index=True)

ผมมี order_id 6 แถว มีซ้ำกัน 2 คู่ (101 กับ 102) หลัง drop_duplicates() จะเหลือ 4 แถว ถ้าอยากรู้ว่าลบไปกี่แถว ใช้ len(df) – len(df.drop_duplicates()) ก็ได้ครับ สะดวกดีตอน QA ข้อมูล

Python Code:

df.drop_duplicates(ignore_index=True)

Result:

   order_id product  amount

0       101       A     500

1       102       B     300

2       103       C     700

3       104       D     200

FAQs

drop_duplicates แก้ไข DataFrame เดิมเลยไหม หรือคืนอันใหม่?

คืน DataFrame ใหม่เสมอครับ DataFrame เดิมไม่เปลี่ยน ถ้าอยากให้ตัวแปรเดิมอัปเดต ต้องรับกลับเองเช่น df = df.drop_duplicates() ผมชอบแบบนี้มากกว่า inplace=True เพราะ debug ง่ายกว่า ย้อนดูข้อมูลเดิมได้ตลอดครับ

ต่างจาก Excel Remove Duplicates ยังไง?

คล้ายกันมากครับ แต่ pandas ยืดหยุ่นกว่าใน 2 เรื่องหลัก คือ (1) เลือกได้ว่าจะเก็บแถวแรกหรือแถวสุดท้ายด้วย keep — Excel เก็บแถวแรกเสมอ และ (2) ไม่แก้ไขข้อมูลเดิม ได้ DataFrame ใหม่กลับมา ทำให้ใช้ใน data pipeline ได้โดยไม่กลัวข้อมูลต้นฉบับเสียหายครับ

ถ้าต้องการลบแถวที่ซ้ำทิ้งหมดเลย (ทั้งแถวแรกและแถวซ้ำ) ทำยังไง?

ใช้ keep=False ครับ เช่น df.drop_duplicates(subset=[‘customer_id’], keep=False) จะลบทุกแถวที่มี customer_id ซ้ำออกหมดเลย ไม่เหลือแม้แต่แถวแรก ผมใช้ตอนอยากเอาเฉพาะ record ที่ไม่มีการซ้ำกันเลย เช่น งาน deduplication ที่ strict มากๆ

Resources & Related

Resources

pandas drop_duplicates (official docs)
article

Related functions

Additional Notes

ปัญหาข้อมูลซ้ำนี่ผมเจอแทบทุกโปรเจกต์เลยครับ 😎 ไม่ว่าจะเป็นข้อมูลที่ join มาจากหลายตาราง หรือ import มาจากหลาย file รวมกัน มักจะมีแถวซ้ำแฝงอยู่เสมอ

drop_duplicates ทำงานง่ายมากครับ มันสแกนทุกแถวและเปรียบเทียบค่าในคอลัมน์ที่เราระบุ ถ้าพบว่าซ้ำกัน ก็จะเอาออก เหลือไว้แค่แถวเดียว โดย default จะเก็บแถวแรกที่เจอ (keep=’first’) และดูทุกคอลัมน์ในการเปรียบเทียบ

ที่เจ๋งคือ parameter subset ครับ — ผมใช้บ่อยมากตอนอยากดูซ้ำแค่บางคอลัมน์ เช่น ถ้ามีข้อมูลลูกค้าที่ customer_id ซ้ำกัน แต่ email ต่างกัน ก็ใช้ subset=[‘customer_id’] เพื่อดึงแถวเฉพาะ customer_id ที่ไม่ซ้ำออกมาได้เลย ไม่ต้องสนใจว่าคอลัมน์อื่นต่างกันหรือเปล่า

ส่วนตัวผมชอบตัวนี้มากกว่า Excel Remove Duplicates ตรงที่มัน return DataFrame ใหม่ให้เลย ข้อมูลเดิมไม่หาย ทำให้ pipeline ยาวๆ ไม่มีผลข้างเคียงครับ

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Thep Excel

pandas drop_duplicates — ลบแถวซ้ำออกจาก DataFrame

Function Metrics

Syntax & Arguments

Examples

FAQs

Resources & Related

Resources

Related functions

Additional Notes

Leave a Reply Cancel reply

pandas drop_duplicates — ลบแถวซ้ำออกจาก DataFrame

Function Metrics

Syntax & Arguments

Examples

FAQs

Resources & Related

Resources

Related functions

Additional Notes

Leave a Reply Cancel reply

เว็บไซต์นี้ใช้คุกกี้ (Cookies)