\(\)Text Mining\(\)
การทำเหมืองข้อความ

อ.ดร. สมศักดิ์ จันทร์เอม

วิทยาลัยนานาชาตินวัตกรรมดิจิทัล มหาวิทยาลัยเชียงใหม่

14 พฤศจิกายน 2568

วัตถุประสงค์การเรียนรู้ (Learning Objectives)

นักศึกษาจะสามารถ…

อธิบายแนวคิดพื้นฐานและหลักการทำงานของ การประมวลผลภาษาธรรมชาติ (Natural Language Processing: NLP) ได้
อธิบายพื้นฐานของ การวิเคราะห์อารมณ์ (Sentiment Analysis) ได้
ระบุและเข้าใจการประยุกต์ใช้ การวิเคราะห์อารมณ์ในชีวิตประจำวัน ได้
ใช้เครื่องมือพื้นฐานสำหรับการวิเคราะห์อารมณ์ได้
ตีความผลลัพธ์ของการวิเคราะห์อารมณ์ได้อย่างชัดเจนและเข้าใจง่าย

อะไรคือการทำเหมืองข้อความ

การทำเหมืองข้อความ (Text Mining) หรือที่เรียกว่า การทำเหมืองข้อมูลจากข้อความ (Text Data Mining) หรือ การวิเคราะห์ข้อความ (Text Analytics) คือกระบวนการดึงข้อมูลเชิงลึก รูปแบบ และความรู้ที่มีประโยชน์จากข้อมูลข้อความที่ไม่มีโครงสร้าง

Text Mining ผสานเทคนิคจาก การประมวลผลภาษาธรรมชาติ (Natural Language Processing: NLP), การเรียนรู้ของเครื่อง (Machine Learning), และ สถิติ (Statistics) เพื่อแปลงข้อความให้เป็นข้อมูลที่มีโครงสร้างและสามารถนำไปวิเคราะห์ต่อได้

การประยุกต์ใช้การทำเหมืองข้อความ (Application of Text Mining)

✅ สถานการณ์

บริษัท อีคอมเมิร์ซ (e-commerce) เช่น Amazon ต้องการปรับปรุงคุณภาพสินค้าและความพึงพอใจของลูกค้า โดยได้รับ รีวิวสินค้าจำนวนหลายพันรายการต่อวัน ซึ่งเป็นข้อมูลข้อความที่ไม่มีโครงสร้าง

🔑 กระบวนการ (Process)

การรวบรวมข้อมูล (Data Collection):
- รวบรวมรีวิวจากเว็บไซต์ แอปพลิเคชัน หรือแพลตฟอร์มภายนอก
การเตรียมข้อมูล (Preprocessing):
- ลบคำหยุด (Stop Words) เช่น “the”, “is”
- ลบเครื่องหมายวรรคตอนและข้อมูลที่ไม่เกี่ยวข้อง
- ทำการ Stemming/Lemmatization เช่น “running” → “run”
เทคนิคการทำเหมืองข้อความ (Text Mining Techniques):
- การวิเคราะห์อารมณ์ (Sentiment Analysis) → แยกรีวิวว่าเป็นบวก 😊, ลบ 😡, หรือกลาง ๆ 😐
- การจำแนกหัวข้อ (Topic Modeling - LDA) → หาหัวข้อหลัก เช่น “การจัดส่ง”, “ราคา”, “คุณภาพ”
- การดึงคำสำคัญ (Keyword Extraction - TF–IDF) → เน้นคำที่พบมากในคำชมและคำติ
การนำไปใช้ทางธุรกิจ (Business Action):
- หากหลายรีวิวพูดถึง “ส่งช้า” → ทีมโลจิสติกส์สามารถตรวจสอบการขนส่งได้
- หากรีวิวบวกพูดถึง “แพ็กเกจดี” → ทีมการตลาดสามารถใช้จุดนี้ในโฆษณา

Amazon ใช้ Text Mining เพื่อ วิเคราะห์รีวิวสินค้า (Product Review Analysis)
- การวิเคราะห์อารมณ์ช่วยจัดอันดับสินค้าและแนะนำสินค้าที่เหมาะสม
- รีวิวเชิงลบทำให้เกิดการแจ้งเตือนต่อทีมควบคุมคุณภาพ
Starbucks ใช้ Text Mining บน Twitter และ Instagram
- ตรวจจับรสชาติยอดนิยมและข้อร้องเรียน
- ปรับกลยุทธ์ทางการตลาด เช่น เปิดตัวเครื่องดื่มตามฤดูกาล

ด้านสาธารณสุข (Healthcare)

การประยุกต์ใช้ Text Mining
🏥 ตัวอย่าง

สาธารณสุข: การทำเหมืองข้อมูลจากเวชระเบียนและบันทึกทางคลินิกเพื่อสนับสนุนการวินิจฉัยโรค

✅ สถานการณ์

โรงพยาบาลและคลินิกสร้างข้อมูลข้อความที่ไม่มีโครงสร้างจำนวนมหาศาล เช่น

เวชระเบียนอิเล็กทรอนิกส์ (Electronic Health Records: EHRs)
บันทึกทางคลินิกของแพทย์
รายงานผลตรวจทางห้องปฏิบัติการ
รายงานภาพรังสีทางการแพทย์

ข้อมูลเหล่านี้มีคุณค่ามาก แต่ยากต่อการวิเคราะห์ด้วยวิธีการแบบแมนนวล

🔑 กระบวนการ

การรวบรวมข้อมูล (Data Collection)
- ดึงข้อมูลจากเวชระเบียน รายงานการจำหน่ายผู้ป่วย และบันทึกของแพทย์
การเตรียมข้อมูล (Preprocessing)
- ลบคำหยุด (Stopwords) และทำให้คำศัพท์ทางการแพทย์เป็นมาตรฐานเดียวกัน
- จัดการคำย่อ เช่น HTN → Hypertension (โรคความดันโลหิตสูง)
เทคนิคการทำเหมืองข้อความ (Text Mining Techniques)
- Named Entity Recognition (NER): ระบุชื่อโรค อาการ และการรักษาที่ปรากฏในข้อความ
- Text Classification: จำแนกบันทึกทางการแพทย์ตามประเภทของการวินิจฉัย
- Clustering & Pattern Mining: ค้นหารูปแบบร่วม เช่น diabetes + hypertension (เบาหวาน + ความดัน)
- Predictive Modeling: ทำนายความเสี่ยงจากข้อมูลในอดีต เช่น ความเสี่ยงของการกลับมารักษาซ้ำ
ผลกระทบทางธุรกิจและสาธารณสุข (Business/Healthcare Impact)
- ช่วยให้แพทย์สามารถ ตรวจจับรูปแบบของโรค และวินิจฉัยได้เร็วขึ้น
- สนับสนุนการสร้าง แผนการรักษาเฉพาะบุคคล (Personalized Treatment Plan)
- ปรับปรุง ความปลอดภัยของผู้ป่วย โดยการตรวจจับการโต้ตอบของยา (Drug Interaction)

IBM Watson Health: ใช้เทคนิค Text Mining เพื่อสกัดข้อมูลสำคัญจากบันทึกทางคลินิกเพื่อช่วยแพทย์ในการวินิจฉัยโรค
โรงพยาบาล Mount Sinai (นิวยอร์ก): ประยุกต์ใช้ NLP กับข้อมูล EHR เพื่อทำนายความเสี่ยงของภาวะหัวใจล้มเหลวได้เร็วกว่าวิธีดั้งเดิม

ด้านการเงิน (Finance)

การประยุกต์ใช้ Text Mining
🌍 ตัวอย่าง

การเงิน: การตรวจจับการทุจริตและการวิเคราะห์อารมณ์ของข่าวเพื่อคาดการณ์ราคาหุ้น

✅ สถานการณ์

สถาบันการเงินต้องจัดการกับข้อมูลข้อความที่ไม่มีโครงสร้างจำนวนมหาศาล เช่น

บันทึกธุรกรรมของลูกค้า
รายการบัตรเครดิต
ข่าวการเงินและรายงานของนักวิเคราะห์
โพสต์บนโซเชียลมีเดียเกี่ยวกับหุ้น

ข้อมูลเหล่านี้มีสัญญาณที่ซ่อนอยู่ซึ่งสามารถใช้ตรวจจับการฉ้อโกงและช่วยในการพยากรณ์การลงทุนได้

🔑 วิธีการทำงานของ Text Mining ในภาคการเงิน

การตรวจจับการฉ้อโกง (Fraud Detection)
- แหล่งข้อมูล (Data Sources): คำอธิบายธุรกรรม ชื่อร้านค้า และบันทึกข้อร้องเรียนของลูกค้า
- เทคนิคที่ใช้:
  - การประมวลผลภาษาธรรมชาติ (NLP) เพื่อวิเคราะห์ข้อความของธุรกรรม
  - การตรวจจับความผิดปกติ (Anomaly Detection) เพื่อระบุพฤติกรรมที่น่าสงสัย
  - แบบจำลองการจำแนกประเภท (Classification Models) เช่น ธุรกรรมปกติ ✅ เทียบกับ ธุรกรรมน่าสงสัย ❌
- ผลลัพธ์: แจ้งเตือนการฉ้อโกงแบบเรียลไทม์ ลดความสูญเสียทางการเงิน

การวิเคราะห์อารมณ์ของข่าวเพื่อคาดการณ์ราคาหุ้น (News Sentiment for Stock Prediction)
- แหล่งข้อมูล (Data Sources): พาดหัวข่าว บทความทางการเงิน และโพสต์ใน Twitter
- เทคนิคที่ใช้:
  - การวิเคราะห์อารมณ์ (Sentiment Analysis) — แบ่งข้อความเป็นบวก/ลบ/เป็นกลาง
  - การระบุชื่อเอนทิตี (Named Entity Recognition - NER) เพื่อระบุชื่อบริษัทและตัวย่อหุ้น
  - การวิเคราะห์ความสัมพันธ์กับการเคลื่อนไหวของตลาด
- ผลลัพธ์: ช่วยให้นักลงทุนคาดการณ์ทิศทางราคาหุ้นและสร้างกลยุทธ์การเทรดที่ขับเคลื่อนด้วยอารมณ์ตลาด

JPMorgan Chase 🏦
- ใช้ Text Mining + Machine Learning วิเคราะห์อีเมล แชต และเอกสารของลูกค้านับล้านรายการเพื่อค้นหาสัญญาณการฉ้อโกงหรือการซื้อขายภายใน (Insider Trading)
Bloomberg Terminal & Reuters 📰
- ใช้ การวิเคราะห์อารมณ์แบบเรียลไทม์ จากข่าวการเงินทั่วโลก
- เทรดเดอร์ได้รับการแจ้งเตือนเมื่ออารมณ์ของตลาดต่อหุ้นหรือสินค้าเปลี่ยนแปลงอย่างฉับพลัน
S&P Global Market Intelligence 📈
- ใช้ NLP วิเคราะห์ข้อความจากการประชุมประกาศผลประกอบการ (Earnings Call Transcripts)
- นักวิเคราะห์สามารถตรวจจับโทนเสียงและอารมณ์ของผู้บริหาร เพื่อใช้เป็นสัญญาณล่วงหน้าของประสิทธิภาพบริษัท

ด้านการศึกษาและการวิจัย (Education & Research)

การประยุกต์ใช้ Text Mining
🌍 ตัวอย่าง

การศึกษาและการวิจัย: การสรุปบทความ การตรวจสอบการคัดลอกผลงาน และการวิเคราะห์การเรียนรู้ (Learning Analytics) 🎓📚

✅ สถานการณ์

มหาวิทยาลัยและนักวิจัยต้องจัดการกับข้อมูลข้อความจำนวนมหาศาลที่ไม่มีโครงสร้าง เช่น

งานวิจัยและบทความทางวิชาการ
รายงานและเรียงความของนักศึกษา
บันทึกการเรียนออนไลน์และโพสต์ในกระดานสนทนา

Text Mining ช่วยให้สามารถประมวลผลและวิเคราะห์ข้อมูลเหล่านี้ได้อย่างมีประสิทธิภาพ

🔑 วิธีการทำงานของ Text Mining ในการศึกษาและการวิจัย

การสรุปบทความ (Summarizing Articles)
- อัลกอริทึม NLP ช่วยสร้างสรุปย่อจากงานวิจัยขนาดยาว
- ช่วยให้นักศึกษาและนักวิจัยประหยัดเวลาในการค้นหาข้อมูลในฐานข้อมูลขนาดใหญ่
- ตัวอย่าง: Elsevier ใช้ AI เพื่อสรุปเนื้อหาในแพลตฟอร์มวิชาการของตน
การตรวจสอบการคัดลอกผลงาน (Plagiarism Detection)
- ระบบจะเปรียบเทียบงานของนักศึกษากับเอกสารจำนวนหลายล้านฉบับ
- ตรวจจับข้อความที่คัดลอกหรือมีการเขียนใหม่อย่างมีนัยสำคัญ
- ตัวอย่าง: Turnitin ใช้เทคนิค Text Mining + Similarity Analysis เพื่อเปรียบเทียบข้อความ
การวิเคราะห์การเรียนรู้ (Learning Analytics)
- วิเคราะห์โพสต์ในฟอรัม รายงาน หรือคำตอบแบบทดสอบของผู้เรียน
- ระบุ “นักศึกษาที่มีความเสี่ยง” จากรูปแบบการเขียนหรือระดับการมีส่วนร่วม
- ตัวอย่าง: Moodle Analytics และ Coursera ใช้ NLP เพื่อติดตามพัฒนาการของผู้เรียน

Turnitin → ระบบตรวจสอบการคัดลอกผลงานจากเอกสารนักศึกษาหลายล้านฉบับทั่วโลก
Coursera & edX → วิเคราะห์การสนทนาในฟอรัมเพื่อปรับปรุงการออกแบบหลักสูตร
Semantic Scholar (Allen Institute for AI) → ใช้ NLP เพื่อสรุปและแนะนำบทความวิจัยที่เกี่ยวข้อง

การประมวลผลภาษาธรรมชาติ (Natural Language Processing)

การประมวลผลภาษาธรรมชาติ (Natural Language Processing: NLP) เป็นสาขาหนึ่งของ ปัญญาประดิษฐ์ (Artificial Intelligence: AI) ที่มุ่งเน้นให้คอมพิวเตอร์สามารถเข้าใจ ตีความ และสร้าง ภาษามนุษย์ (Human Language) ทั้งในรูปแบบคำพูดและข้อความได้

NLP ผสานเทคนิคจาก ภาษาศาสตร์ (Linguistics), วิทยาการคอมพิวเตอร์ (Computer Science), และ การเรียนรู้ของเครื่อง (Machine Learning) เพื่อเชื่อมช่องว่างระหว่างการสื่อสารของมนุษย์กับการทำความเข้าใจของคอมพิวเตอร์

ความสามารถหลักของ NLP

การเตรียมข้อความ (Text Preprocessing) → การตัดคำ (Tokenization), การทำรากคำ (Stemming), การทำ Lemmatization, การลบคำหยุด (Stop-word Removal)
การจำแนกข้อความ (Text Classification) → การตรวจจับสแปม การวิเคราะห์อารมณ์ การติดป้ายหัวข้อ
การระบุเอนทิตีในข้อความ (Named Entity Recognition: NER) → การระบุชื่อบุคคล สถานที่ วันที่ องค์กร
การแปลภาษาอัตโนมัติ (Machine Translation) → เช่น Google Translate, DeepL
การวิเคราะห์อารมณ์ (Sentiment Analysis) → การตรวจจับอารมณ์ เช่น บวก ลบ หรือเป็นกลาง
การรู้จำเสียงพูด (Speech Recognition) → แปลงเสียงพูดเป็นข้อความ เช่น Siri, Alexa
การสร้างข้อความ (Text Generation) → แชตบอตและโมเดลภาษาขนาดใหญ่ เช่น ChatGPT, Gemini ✨

การวิเคราะห์อารมณ์ (Sentiment Analysis)

การวิเคราะห์อารมณ์ (Sentiment Analysis) เป็นเทคนิคที่ใช้ในการตรวจจับ “โทนอารมณ์” ของข้อความ โดยช่วยให้คอมพิวเตอร์สามารถระบุได้ว่าข้อความนั้นมีอารมณ์แบบ เชิงบวก, เชิงลบ หรือ เป็นกลาง

ตัวอย่างประโยคและค่าคะแนนอารมณ์

โดยทั่วไป เราจะกำหนด ค่าคะแนนอารมณ์ (Sentiment Score) ให้กับข้อความ ซึ่งค่าจะอยู่ระหว่าง –1 (เชิงลบมาก) ถึง +1 (เชิงบวกมาก)

ตัวอย่างที่ 1
ตัวอย่างที่ 2
ตัวอย่างที่ 3

ประโยค:

“The movie was fantastic and inspiring.” (ภาพยนตร์เรื่องนี้ยอดเยี่ยมและสร้างแรงบันดาลใจ)

ค่าคะแนนอารมณ์ (Sentiment Value): +0.85 (เชิงบวกอย่างมาก)

ประโยค:

“The service was terrible and disappointing.” (การบริการแย่มากและน่าผิดหวัง)

ค่าคะแนนอารมณ์ (Sentiment Value): –0.80 (เชิงลบอย่างมาก)

ประโยค:

“The food was okay, nothing special.” (อาหารก็พอใช้ได้ ไม่ได้พิเศษอะไร)

ค่าคะแนนอารมณ์ (Sentiment Value): 0.05 (เป็นกลาง / เชิงบวกเล็กน้อย)

การวิเคราะห์อารมณ์แบบมาตรฐาน (Standard Sentiment Analysis: SSA)

งานหลัก (Task): จำแนกข้อความออกเป็น 3 ประเภทหลัก ได้แก่ เชิงบวก (Positive), เชิงลบ (Negative), หรือ เป็นกลาง (Neutral)

ตัวอย่าง (Example)

ข้อความ: “The food was delicious.” → เชิงบวก (Positive)
ข้อความ: “The service was slow.” → เชิงลบ (Negative)

2. การวิเคราะห์อารมณ์เชิงละเอียด (Fine-grained Sentiment Analysis – SSA Upgrade)

งานหลัก (Task): แยกระดับอารมณ์ (Sentiment Polarity) ออกเป็นหลายระดับ เพื่อให้เข้าใจความเข้มของอารมณ์ได้ชัดเจนยิ่งขึ้น

ระดับของอารมณ์ (Levels)
ตัวอย่าง

เชิงบวกมาก (Very Positive): 😍 / 🤩 / 🥳 / ⭐⭐⭐⭐⭐
เชิงบวก (Positive): 🙂 / 😊 / ⭐⭐⭐⭐
เป็นกลาง (Neutral): 😐 / 😶 / ⭐⭐⭐
เชิงลบ (Negative): 🙁 / 😟 / ⭐⭐
เชิงลบมาก (Very Negative): 😡 / 😠 / 😭 / ⭐

ประโยค (Sentence)

ข้อความ: “The movie was absolutely amazing!” → เชิงบวกมาก (Very Positive)
ข้อความ: “The product is okay.” → เป็นกลาง (Neutral)
ข้อความ: “This was the worst experience ever!” → เชิงลบมาก (Very Negative)

3. การตรวจจับอารมณ์ (Emotion Detection)

งานหลัก (Task): ใช้เทคนิค NLP ร่วมกับแบบจำลองทางจิตวิทยา เพื่อจำแนกอารมณ์ของข้อความอย่างเฉพาะเจาะจง

หมวดหมู่อารมณ์ทั่วไป
ตัวอย่าง

ความสุข (Happiness)
ความโกรธ (Anger)
ความเศร้า (Sadness)
ความกลัว (Fear)

ความประหลาดใจ (Surprise)
ความรังเกียจ (Disgust)
อื่น ๆ (etc.)

I’m so excited for my new job! → Joy/Excitement 😀🤩
I’m scared about the results. → Fear 😨
This food tastes terrible. → Disgust 🤢
Wow, I didn’t expect that surprise party! → Surprise 😲

4. การวิเคราะห์อารมณ์ตามแง่มุม (Aspect-Based Sentiment Analysis: ABSA)

การวิเคราะห์แบบนี้จะมองหา แง่มุมหรือคุณลักษณะเฉพาะของสินค้า/บริการ เพื่อระบุว่าอารมณ์ในข้อความนั้นเกี่ยวข้องกับส่วนใดของสินค้า

งานหลัก (Task): ระบุส่วนของสินค้า/บริการที่อารมณ์ถูกกล่าวถึง

ตัวอย่างการวิเคราะห์อารมณ์ตามแง่มุม (Aspect Sentiments):

กล้อง (Camera) → เชิงบวก (Positive)
แบตเตอรี่ (Battery) → เชิงลบ (Negative)
ราคา (Price) → เป็นกลาง (Neutral)

สรุป (Summary)

SSA → จำแนกเป็น เชิงบวก / เชิงลบ / เป็นกลาง
Fine-grained → เพิ่มระดับความเข้มของอารมณ์ (เชิงบวกมาก → เชิงลบมาก)
Emotion Detection → ระบุอารมณ์เฉพาะ (เช่น ความสุข ความโกรธ ความกลัว ฯลฯ)
ABSA → เชื่อมโยงอารมณ์กับคุณลักษณะเฉพาะของสินค้า/บริการ

Interactive Sentiment Analysis (Demo)

(async () => {
  // ========== SHELL ==========
  const box = html`<div style="max-width:1200px;font:14px system-ui;">
    <style>
      .grid { display:grid; grid-template-columns: 340px 1fr; gap:16px; }
      .card { background:#fff; border:1px solid #ddd; border-radius:10px; padding:12px; }
      .row { display:flex; gap:10px; align-items:center; flex-wrap:wrap; }
      .pill { display:inline-block; padding:2px 8px; border-radius:999px; font:12px system-ui; border:1px solid #ddd; }
      .tok { padding:1px 4px; border-radius:6px; margin:2px 3px; display:inline-block; }
      .tok.pos { background:#e7f6ec; border:1px solid #b8e0c6; }
      .tok.neg { background:#fde7e7; border:1px solid #f6bcbc; }
      .tok.neu { background:#f1f1f1; border:1px solid #e2e2e2; }
      .bar { height:12px; background:#eee; border-radius:999px; overflow:hidden; }
      .bar > div { height:100%; }
      .mono { font-family: ui-monospace, SFMono-Regular, Menlo, Consolas, monospace; }
      textarea { width:100%; min-height:120px; font:13px/1.4 system-ui; }
      .hint { font-size:12px; color:#666; }
      table.dict { width:100%; border-collapse:collapse; }
      table.dict th, table.dict td { border-bottom:1px solid #eee; padding:6px 4px; text-align:left; }
      table.dict th { border-bottom:1px solid #ddd; }
      .badge { font:11px system-ui; border:1px solid #ddd; padding:1px 6px; border-radius:999px; }
    </style>

    <div class="grid">
      <div class="card">
        <div class="row"><span class="pill">Sentiment Lab</span></div>

        <div style="margin-top:10px">
          <label>Language</label>
          <div class="row" id="langRow"></div>

          <label style="display:block;margin-top:8px">Mode</label>
          <div class="row" id="modeRow"></div>

          <div id="singleWrap" style="margin-top:8px"></div>
          <div id="batchWrap"  style="display:none; margin-top:8px"></div>

          <div style="margin-top:10px">
            <b>Parameters</b>
            <div class="row" id="paramRow"></div>
            <div style="margin-top:6px" id="threshRow"></div>
          </div>

          <div style="margin-top:8px" id="btnRow"></div>
          <div class="hint" style="margin-top:6px">Tip: Batch mode → one sentence per line; threshold controls what’s considered Neutral.</div>
        </div>
      </div>

      <div class="card">
        <div class="row"><span class="pill">Results</span></div>
        <div id="summary" style="margin-top:8px"></div>
        <div id="viz" style="margin-top:10px"></div>
      </div>
    </div>

    <div class="card" style="margin-top:16px">
      <div class="row" style="justify-content:space-between; align-items:center;">
        <span class="pill">Detectable Terms Dictionary</span>
        <div id="dictCtrl" class="row"></div>
      </div>
      <div id="dictTable" style="margin-top:8px"></div>
    </div>
  </div>`;

  // ---------- Controls ----------
  const langSel   = Inputs.radio(["English","Thai"], {value:"English"});
  const modeSel   = Inputs.radio(["Single text","Batch (one per line)"], {value:"Single text"});
  const textSingle= Inputs.textarea({label:"Text",  value:"I absolutely love this! But the battery isn't great.", rows:6});
  const textBatch = Inputs.textarea({label:"Lines", value:"I love this so much!\nThis is not good at all.\nใช้งานง่ายมากเลย ชอบ!\nไม่ค่อยดีเท่าไหร่", rows:8});

  const negToggle   = Inputs.toggle({label:"Negation (not/ไม่)", value:true});
  const intToggle   = Inputs.toggle({label:"Intensifiers (!!, very, มาก)", value:true});
  const emojiToggle = Inputs.toggle({label:"Emoji cues 🙂😢", value:true});
  const threshRange = Inputs.range([0, 1], {label:"Neutral zone ±", step:0.05, value:0.15});
  const analyzeBtn  = Inputs.button("Analyze");

  box.querySelector("#langRow").append(langSel);
  box.querySelector("#modeRow").append(modeSel);
  box.querySelector("#singleWrap").append(textSingle);
  box.querySelector("#batchWrap").append(textBatch);
  box.querySelector("#paramRow").append(negToggle, intToggle, emojiToggle);
  box.querySelector("#threshRow").append(threshRange);
  box.querySelector("#btnRow").append(analyzeBtn);

  function syncMode() {
    const singleWrap = box.querySelector("#singleWrap");
    const batchWrap  = box.querySelector("#batchWrap");
    const mode = modeSel.value;
    singleWrap.style.display = (mode==="Single text") ? "" : "none";
    batchWrap.style.display  = (mode==="Batch (one per line)") ? "" : "none";
  }
  modeSel.addEventListener("input", syncMode);
  syncMode();

  const summary = box.querySelector("#summary");
  const viz     = box.querySelector("#viz");

  // ========== LEXICONS (Expanded) ==========
  const LEX_EN = {
    pos: [
      "love","great","good","amazing","awesome","nice","happy","excellent","like","fantastic","cool",
      "wonderful","brilliant","delightful","impressive","superb","satisfying","pleasant","lovely","marvelous"
    ],
    neg: [
      "bad","terrible","awful","hate","worse","worst","boring","slow","buggy","disappoint","poor",
      "horrible","mediocre","useless","unreliable","frustrating","laggy","expensive","crash","broken"
    ]
  };
  const LEX_TH = {
    pos: [
      "ชอบ","ดีมาก","ดี","เยี่ยม","สุดยอด","ประทับใจ","โอเค","ง่าย","เจ๋ง","สุดยอดมาก","รัก",
      "ประเสริฐ","โอเคมาก","แจ่ม","ปัง","เริ่ด","คุ้มค่า","ประหยัดเวลา","น่าพอใจ","สวยงาม"
    ],
    neg: [
      "แย่","ไม่ดี","แย่มาก","ช้า","ห่วย","ผิดหวัง","น่าเบื่อ","งง","เลว","โคตรแย่","พัง",
      "หงุดหงิด","ปวดหัว","ห่วยแตก","แพง","ใช้งานไม่ได้","บั๊ก","หลุด","ค้าง","ล้มเหลว"
    ]
  };

  // Emoji cues
  const EMOJI = {
    "😀":2, "🙂":1.5, "😍":2.5, "😂":1.5, "😢":-2, "😡":-2.5, "😭":-2.5, "👍":1.5, "👎":-1.5,
    "🔥":1.5, "💔":-1.5, "✨":1.2, "🤩":2.0, "🤮":-2.0
  };

  // Intensifiers / Negations
  const BOOST_EN = new Set(["very","really","so","extremely","super","highly","truly","incredibly","insanely"]);
  const BOOST_TH = new Set(["มาก","มากๆ","สุดๆ","โคตร","สุดสุด","อย่างยิ่ง","สุดยอด"]);
  const NEG_EN   = new Set(["not","no","never","n't"]);
  const NEG_TH   = new Set(["ไม่","ไม่ได้","ไม่มี","มิได้","มิใช่","ไม่มีทาง"]);

  // ========== Core helpers ==========
  const SCORE = (w, lang) => {
    const L = (lang==="Thai") ? LEX_TH : LEX_EN;
    if (L.pos.includes(w)) return +3;
    if (L.neg.includes(w)) return -3;
    return 0;
  };

  function tokenize(text, lang){
    if (lang==="Thai"){
      const raw = text.replace(/[.,!?()";:]/g, " ").split(/\s+/).filter(Boolean);
      return raw;
    }
    return text.toLowerCase().replace(/[^a-z0-9\s'!🙂😀😍😂😢😡😭👍👎🔥💔✨🤩🤮]/g," ")
      .split(/\s+/).filter(Boolean);
  }

  function analyzeOne(text, lang, opts){
    const negSet   = (lang==="Thai") ? NEG_TH : NEG_EN;
    const boostSet = (lang==="Thai") ? BOOST_TH : BOOST_EN;

    const emojis = Array.from(text).filter(c=> EMOJI[c]);
    const toks = tokenize(text, lang);
    const rows = [];

    // simple negation scope (next 1–2 tokens)
    let i=0;
    while(i<toks.length){
      const t = toks[i];
      const isNeg = opts.neg && negSet.has(t.replace(/[’']/g,"'"));
      if (isNeg){
        const nextN = Math.min(2, toks.length - i - 1);
        for (let k=1; k<=nextN; k++){
          const w = toks[i+k]; const s = SCORE(w, lang);
          rows.push({tok:w, base:s, contrib: s ? -s : 0, effect:"negation"});
        }
        rows.push({tok:t, base:0, contrib:0, effect:"negator"});
        i += (1 + nextN);
        continue;
      }

      // base + intensifier (look-behind)
      let s = SCORE(t, lang);
      const prev = toks[i-1];
      if (opts.intens && prev && boostSet.has(prev)){
        s = s ? s*1.5 : 0;
      }
      rows.push({tok:t, base:s, contrib:s, effect: s? "base":"none"});
      i++;
    }

    // emoji contribution
    if (opts.emoji && emojis.length){
      const emoSum = emojis.reduce((acc,e)=> acc + (EMOJI[e]||0), 0);
      rows.push({tok: emojis.join(""), base:emoSum, contrib:emoSum, effect:"emoji"});
    }

    const raw = rows.reduce((a,b)=> a + (b.contrib||0), 0);
    return { rows, raw };
  }

  function labelFromScore(s, nz){
    if (s >  nz) return ["Positive","#26a269"];
    if (s < -nz) return ["Negative","#c01c28"];
    return ["Neutral","#777"];
  }

  function renderSingle(text, lang, opts, neutralZone){
    const {rows, raw} = analyzeOne(text, lang, opts);
    const [lab, col] = labelFromScore(raw, neutralZone);

    summary.innerHTML = `
      <div class="row">
        <div class="pill">Language: <b>${lang}</b></div>
        <div class="pill">Score: <b class="mono">${raw.toFixed(2)}</b></div>
        <div class="pill">Label: <b style="color:${col}">${lab}</b></div>
      </div>
      <div style="margin-top:6px" class="bar">
        <div style="width:${Math.max(0, Math.min(100,(raw+4)/8*100))}%; background:${col}"></div>
      </div>
      <div class="hint" style="margin-top:6px">Score range ~[-4, 4]. Neutral if |score| ≤ ${neutralZone}.</div>
    `;

    const data = rows.filter(r=>r.tok.trim().length)
                     .map((r,i)=>({i, tok:r.tok, contrib:r.contrib||0, effect:r.effect, sign: Math.sign(r.contrib||0)}));

    viz.innerHTML = "";
    const W = 820, H = 280;
    const fig1 = Plot.plot({
      width: W, height: H, grid: true,
      x: {label: "token index"},
      y: {label: "contribution", domain: [-4.5,4.5]},
      marks: [
        Plot.ruleY([0]),
        Plot.barY(data, {x:"i", y:"contrib", fill: d => d.sign>0 ? "#42b883" : (d.sign<0 ? "#e76f51" : "#bbb")}),
        Plot.text(data, {x:"i", y:d=>d.contrib>0? d.contrib+0.15 : d.contrib-0.15, text:"tok", fontSize:11, textAnchor:"middle"})
      ]
    });

    const line = document.createElement("div");
    line.style.marginTop = "8px";
    for (const r of data){
      const cls = r.contrib>0 ? "pos" : (r.contrib<0 ? "neg" : "neu");
      const tip = `${r.tok} (${(r.contrib||0).toFixed(2)} ${r.effect})`;
      line.insertAdjacentHTML("beforeend", `<span class="tok ${cls}" title="${tip}">${r.tok}</span>`);
    }

    viz.append(fig1, line);
  }

  function renderBatch(lines, lang, opts, neutralZone){
    const rows = [];
    for (const s of lines){
      const {raw} = analyzeOne(s, lang, opts);
      const [lab] = labelFromScore(raw, neutralZone);
      rows.push({text:s, score:raw, label:lab});
    }

    summary.innerHTML = `
      <div class="row">
        <div class="pill">Language: <b>${lang}</b></div>
        <div class="pill">Samples: <b class="mono">${rows.length}</b></div>
      </div>
      <div class="hint" style="margin-top:6px">Neutral if |score| ≤ ${neutralZone}. Drag the threshold in the sidebar to see label flips.</div>
    `;

    viz.innerHTML = "";
    const W = 820, H = 260;
    const figH = Plot.plot({
      width: W, height: H, grid:true,
      x: {label:"score"},
      y: {label:"count"},
      marks: [
        Plot.rectY(rows, Plot.binY({y:"count"}, {x:"score", thresholds:16})),
        Plot.ruleX([neutralZone, -neutralZone], {stroke:"#999", strokeDasharray:"4,4"}),
        Plot.text([`+${neutralZone}`], {x:neutralZone, y:0, dy:-8}),
        Plot.text([`-${neutralZone}`], {x:-neutralZone, y:0, dy:-8})
      ]
    });

    // table
    const tbl = html`<table style="width:100%; border-collapse:collapse; margin-top:8px;">
      <thead><tr>
        <th style="border-bottom:1px solid #ddd; text-align:left">Text</th>
        <th style="border-bottom:1px solid #ddd; text-align:right">Score</th>
        <th style="border-bottom:1px solid #ddd; text-align:left">Label</th>
      </tr></thead>
      <tbody></tbody>
    </table>`;
    const tb = tbl.querySelector("tbody");
    for (const r of rows){
      const [_, col] = labelFromScore(r.score, neutralZone);
      tb.insertAdjacentHTML("beforeend",
        `<tr>
           <td style="border-bottom:1px solid #eee; padding:4px 0">${r.text.replace(/</g,"&lt;")}</td>
           <td class="mono" style="border-bottom:1px solid #eee; text-align:right">${r.score.toFixed(2)}</td>
           <td style="border-bottom:1px solid #eee; color:${col}">${r.label}</td>
         </tr>`);
    }

    viz.append(figH, tbl);
  }

  function run(){
    const lang = langSel.value;
    const mode = modeSel.value;
    const opts = {
      neg:    !!negToggle.value,
      intens: !!intToggle.value,
      emoji:  !!emojiToggle.value
    };
    const nz = +threshRange.value;

    if (mode === "Single text"){
      renderSingle(textSingle.value || "", lang, opts, nz);
    } else {
      const lines = (textBatch.value || "").split(/\r?\n/).map(s=>s.trim()).filter(Boolean);
      renderBatch(lines, lang, opts, nz);
    }
  }

  analyzeBtn.addEventListener("click", run);
  run(); // initial

  // ======== Dictionary Table (Show/Hide + CSV export) ========
  const dictTable = box.querySelector("#dictTable");
  const dictCtrl  = box.querySelector("#dictCtrl");

  const showTblToggle = Inputs.toggle({ label: "Show table", value: false });
  const langFilter = Inputs.radio(["All","English","Thai"], {value:"All"});
  const typeFilter = Inputs.select(["All","positive","negative","intensifier","negation","emoji"], {value:"All"});
  const copyBtn    = Inputs.button("Copy CSV");
  const downloadBtn= Inputs.button("Download CSV");

  dictCtrl.append(
    showTblToggle,
    html`<span class="badge">Filter:</span>`,
    langFilter,
    typeFilter,
    copyBtn,
    downloadBtn
  );

  function buildDictRows(){
    const rows = [];

    // EN
    for (const w of LEX_EN.pos) rows.push({language:"English", type:"positive", term:w, value:3});
    for (const w of LEX_EN.neg) rows.push({language:"English", type:"negative", term:w, value:-3});
    for (const w of BOOST_EN)   rows.push({language:"English", type:"intensifier", term:w, value:"×1.5"});
    for (const w of NEG_EN)     rows.push({language:"English", type:"negation", term:w, value:"flip"});
    for (const [emo,sc] of Object.entries(EMOJI)) rows.push({language:"English", type:"emoji", term:emo, value:sc});

    // TH
    for (const w of LEX_TH.pos) rows.push({language:"Thai", type:"positive", term:w, value:3});
    for (const w of LEX_TH.neg) rows.push({language:"Thai", type:"negative", term:w, value:-3});
    for (const w of BOOST_TH)   rows.push({language:"Thai", type:"intensifier", term:w, value:"×1.5"});
    for (const w of NEG_TH)     rows.push({language:"Thai", type:"negation", term:w, value:"flip"});

    return rows;
  }

  function renderDict(){
    const lf = langFilter.value;
    const tf = typeFilter.value;

    const all = buildDictRows().filter(r =>
      (lf==="All"  || r.language===lf) &&
      (tf==="All"  || r.type===tf)
    );

    const tbl = html`<table class="dict">
      <thead>
        <tr>
          <th>Language</th><th>Type</th><th>Term</th><th>Value</th>
        </tr>
      </thead>
      <tbody></tbody>
    </table>`;
    const tb = tbl.querySelector("tbody");

    for (const r of all){
      tb.insertAdjacentHTML("beforeend",
        `<tr>
          <td>${r.language}</td>
          <td>${r.type}</td>
          <td class="mono">${r.term.replace(/</g,"&lt;")}</td>
          <td class="mono">${r.value}</td>
        </tr>`);
    }

    dictTable.innerHTML = "";
    dictTable.append(tbl);
  }

  function toCSV(rows){
    const header = ["language","type","term","value"];
    const esc = v => `"${String(v).replace(/"/g,'""')}"`;
    const lines = [header.map(esc).join(",")].concat(
      rows.map(r => [esc(r.language),esc(r.type),esc(r.term),esc(r.value)].join(","))
    );
    return lines.join("\n");
  }

  copyBtn.addEventListener("click", async () => {
    const lf = langFilter.value;
    const tf = typeFilter.value;
    const rows = buildDictRows().filter(r =>
      (lf==="All"  || r.language===lf) &&
      (tf==="All"  || r.type===tf)
    );
    const csv = toCSV(rows);
    try {
      await navigator.clipboard.writeText(csv);
      copyBtn.textContent = "Copied!";
      setTimeout(()=> copyBtn.textContent = "Copy CSV", 1000);
    } catch {
      alert(csv); // fallback
    }
  });

  downloadBtn.addEventListener("click", () => {
    const lf = langFilter.value;
    const tf = typeFilter.value;
    const rows = buildDictRows().filter(r =>
      (lf==="All"  || r.language===lf) &&
      (tf==="All"  || r.type===tf)
    );
    const csv = toCSV(rows);
    const blob = new Blob([csv], {type:"text/csv;charset=utf-8"});
    const url = URL.createObjectURL(blob);
    const a = document.createElement("a");
    const ts = new Date().toISOString().slice(0,19).replace(/[:T]/g,"-");
    a.href = url;
    a.download = `sentiment-dictionary-${lf}-${tf}-${ts}.csv`;
    document.body.appendChild(a);
    a.click();
    a.remove();
    URL.revokeObjectURL(url);
  });

  function syncDictVisibility(){
    const on = !!showTblToggle.value;
    dictTable.style.display = on ? "" : "none";
  }
  showTblToggle.addEventListener("input", syncDictVisibility);

  langFilter.addEventListener("input", renderDict);
  typeFilter.addEventListener("input", renderDict);

  renderDict();
  syncDictVisibility();

  return box;
})()

Example

I absolutely love this product—super easy to use! 🙂
The app is good, but the battery life is not great.
This update is incredibly fast and really impressive.
It’s not bad, just a bit slow sometimes.
The UX is terrible… I’m so disappointed. 👎
ใช้งานง่ายมาก ชอบฟีเจอร์ใหม่ที่สุด!
ไม่ดีเท่าไหร่ แถมค้างบ่อยๆ จนหงุดหงิด 😡
บริการโอเคนะ แต่ไม่ได้เร็วมาก
ราคาแพงไปนิด แต่คุณภาพก็ดีมากจริงๆ
Nothing special—works as expected.

Workflow of Sentiment Analysis

ขั้นตอนการเตรียมข้อความ (Preprocessing Steps): การทำความสะอาด การทำให้เป็นมาตรฐาน และการจัดโครงสร้าง

Tokenization
- การแบ่งประโยคออกเป็นหน่วยย่อย (tokens) เช่น คำหรือวลี
- ตัวอย่าง: “The movie was great” → [“The”, “movie”, “was”, “great”]
Lowercasing / Normalization
- การแปลงข้อความทั้งหมดให้เป็นตัวพิมพ์เล็ก เพื่อลดความซ้ำซ้อน
- ตัวอย่าง: “Great” และ “great” จะถือว่าเป็นคำเดียวกัน
Stop-word Removal
- การลบคำที่พบได้บ่อยแต่ไม่มีความหมายสำคัญ
- ตัวอย่าง: “the”, “is”, “and”, “of”

Stemming
- การตัดคำให้เหลือรูปคำราก โดยการตัดส่วนต่อท้ายออก
- ตัวอย่าง: “running”, “runs” → “run”
Lemmatization
- การแปลงคำให้เป็นรูปพื้นฐานโดยอิงตามไวยากรณ์และพจนานุกรม
- ตัวอย่าง: “better” → “good”, “am/are/is” → “be”
Punctuation & Special Character Removal
- การลบสัญลักษณ์ ตัวเลข หรือเครื่องหมายวรรคตอนที่ไม่จำเป็น
- ตัวอย่าง: “!!!” → “”
Handling Negations
- การรักษาความหมายของคำปฏิเสธ เช่น “not good” เพื่อไม่ให้ความหมายเปลี่ยนไป

🔎 การสกัดคุณลักษณะ (Feature Extraction)

Feature Extraction คือกระบวนการแปลงข้อความที่ผ่านการเตรียมแล้วให้เป็น เวกเตอร์ตัวเลข (Numerical Vectors) เพื่อให้โมเดล Machine Learning หรือ Deep Learning สามารถเข้าใจและนำไปวิเคราะห์ได้

เทคนิคหลัก (Main Techniques)

1. Bag of Words (BoW)

แนวคิด (Concept): แทนข้อความด้วยการนับจำนวนครั้งที่คำแต่ละคำปรากฏ โดยไม่สนใจลำดับหรือไวยากรณ์
ข้อดี (Pros): เข้าใจง่ายและใช้งานง่าย
ข้อเสีย (Cons): สูญเสียบริบทของคำ และทำให้ได้ข้อมูลที่กระจัดกระจาย (Sparse Data)

ตัวอย่าง (Example):

ข้อความ: “The movie was great, great acting”
คุณลักษณะ: {the:1, movie:1, was:1, great:2, acting:1}

2. TF–IDF

Term Frequency – Inverse Document Frequency

แนวคิด (Concept): กำหนดค่าน้ำหนักให้คำตามความถี่ที่ปรากฏในเอกสารหนึ่ง เทียบกับความถี่ของคำนั้นในเอกสารทั้งหมด
ข้อดี (Pros): ลดความสำคัญของคำที่พบบ่อย เช่น the, is
ข้อเสีย (Cons): ยังไม่สามารถจับความหมายในเชิงบริบทของคำได้

ตัวอย่าง (Example): คำว่า “quality” ในรีวิวสินค้าจะมีค่าน้ำหนักสูงกว่าคำว่า “the”

3. Word Embeddings

แนวคิด (Concept): แปลงคำให้เป็นเวกเตอร์แบบหนาแน่น (Dense Vectors) โดยคำที่มีความหมายใกล้เคียงกันจะอยู่ใกล้กันในพื้นที่เวกเตอร์
โมเดลยอดนิยม (Models): Word2Vec, GloVe, fastText
ข้อดี (Pros): จับความสัมพันธ์เชิงความหมายของคำได้ดี
ข้อเสีย (Cons): เวกเตอร์ที่ผ่านการฝึก (Pre-trained) อาจไม่ครอบคลุมคำเฉพาะทางในบางสาขา

ตัวอย่าง (Example):

king – man + woman ≈ queen

4. Contextual Embeddings

แนวคิด (Concept): ใช้โมเดลภาษาขั้นสูง (เช่น BERT, RoBERTa, GPT embeddings) เพื่อจับความหมายของคำตามบริบทในประโยค
ข้อดี (Pros): เข้าใจบริบท (Context-aware) และให้ประสิทธิภาพสูงสุดในงาน NLP ปัจจุบัน
ข้อเสีย (Cons): ต้องใช้พลังการประมวลผลสูง (Computationally expensive)

ตัวอย่าง (Example): คำว่า “bank” ใน “river bank” ≠ “bank” ใน “financial bank”

Model

การจำแนกประเภท (Classification)

ข้อมูลนำเข้า (Input): ข้อความดิบ เช่น รีวิว ทวีต ข่าว
⚙ กระบวนการ (Process): โมเดลการจำแนกประเภท เช่น Naive Bayes, Logistic Regression, SVM, Neural Network
ผลลัพธ์ (Output): ป้ายกำกับแบบไม่ต่อเนื่อง เช่น Positive / Negative / Neutral, Spam / Not Spam

การถดถอย (Regression)

ข้อมูลนำเข้า (Input): ข้อความดิบ เช่น รีวิว ข่าวการเงิน หรือโพสต์บนโซเชียลมีเดีย
⚙ กระบวนการ (Process): โมเดลการถดถอย เช่น Linear Regression, Ridge/Lasso, SVR, Neural Networks
ผลลัพธ์ (Output): ค่าตัวเลขต่อเนื่อง เช่น Predicted Rating = 4.2, Stock Change = –1.5%, Engagement Score = 2000 likes

การจัดกลุ่ม (Clustering)

ข้อมูลนำเข้า (Input): ข้อความดิบ เช่น รีวิวลูกค้า บทความวิจัย หรือผลการสำรวจ
⚙ กระบวนการ (Process): โมเดลการจัดกลุ่ม เช่น k-Means, Hierarchical Clustering, DBSCAN, หรือการจำแนกหัวข้อ (Topic Modeling) เช่น LDA
ผลลัพธ์ (Output): กลุ่มของข้อความที่มีความคล้ายคลึงกัน เช่น Delivery Issues, Price Concerns, Product Quality

ผลลัพธ์และการแสดงภาพข้อมูล

หลังจากผ่านขั้นตอน การเตรียมข้อมูล (Preprocessing), การสกัดคุณลักษณะ (Feature Extraction), และ การจำแนกประเภท (Classification) ระบบจะสร้างผลลัพธ์ที่สามารถ ตีความและแสดงผลในรูปแบบภาพ (Interpreted and Visualized) ได้

🔹 ผลลัพธ์หลัก (Key Outputs)

ป้ายกำกับอารมณ์ (Sentiment Label)
- เป็นผลลัพธ์หลักจากการจำแนกประเภท
- หมวดหมู่: เชิงบวก (Positive), เชิงลบ (Negative), เป็นกลาง (Neutral) หรือในแบบละเอียดอาจมีตั้งแต่ “เชิงบวกมาก → เชิงลบมาก”
- ตัวอย่าง: “The product is excellent” → Positive

คะแนนอารมณ์ / ความน่าจะเป็น (Sentiment Score / Probability)
- ค่าตัวเลขแทนอารมณ์โดยมีระดับความเข้มของอารมณ์
- ช่วงค่า: –1.0 (เชิงลบมาก) ถึง +1.0 (เชิงบวกมาก)
- ตัวอย่าง:
  - “I love this phone” → +0.85
  - “The service is awful” → –0.90

การวิเคราะห์อารมณ์ตามแง่มุม (Aspect-Based Sentiment)
- แสดงอารมณ์ที่สัมพันธ์กับคุณลักษณะเฉพาะของสินค้า
- ตัวอย่าง: “The phone’s camera is great but the battery is bad”
  - Camera → Positive (+0.8)
  - Battery → Negative (–0.7)

📊 เทคนิคการแสดงผล (Visualization Techniques)

แผนภูมิวงกลม (Pie Charts)
- แสดงสัดส่วนของรีวิวที่เป็นเชิงบวก เชิงลบ และเป็นกลาง
แผนภูมิแท่ง (Bar Charts)
- เปรียบเทียบอารมณ์ระหว่างสินค้า แบรนด์ หรือช่วงเวลา
กราฟอนุกรมเวลา (Time-Series Plots)
- แสดงแนวโน้มของอารมณ์ตามช่วงเวลา เช่น ทวีตระหว่างเหตุการณ์สำคัญ
Word Clouds
- เน้นคำเชิงบวก/เชิงลบที่พบบ่อยในข้อความ
แดชบอร์ด (Dashboards)
- รวมกราฟและตัวชี้วัดสำคัญ (KPIs) เพื่อช่วยผู้บริหารตัดสินใจได้อย่างรวดเร็ว

Interactive Bag of Words

(async () => {
  // ---- Load Plot (with fallback) ----
  let Plot;
  try {
    Plot = await require("@observablehq/plot@0.6.17");
  } catch (err) {
    const m = await import("https://esm.sh/@observablehq/plot@0.6?bundle");
    Plot = m.default || m;
  }

  // ---- Shell & Styles ----
  const box = html`<div style="max-width:1400px;margin:0 auto;font:14px system-ui;">
    <style>
      :root { --bow-border:#111; --bow-muted:#6b7280; }
      .layout { display:grid; grid-template-columns: 380px 1fr; gap:18px; align-items:start; }
      .card { background:#fff; border:1px solid #e5e7eb; border-radius:12px; padding:12px; }
      .row { display:flex; gap:10px; align-items:center; flex-wrap:wrap; }
      .label { font-weight:700; font-size:16px; margin-top:4px; }
      .hint { color:var(--bow-muted); font-size:12px; }
      .textarea { width:100%; min-height:220px; padding:10px 12px; border:3px solid var(--bow-border); border-radius:6px; box-sizing:border-box; font:14px/1.5 system-ui; }
      .field input[type=text] { width:100%; padding:8px 10px; border:3px solid var(--bow-border); border-radius:6px; box-sizing:border-box; }
      .h-radio { display:flex; gap:14px; align-items:center; flex-wrap:wrap; }
      .pill { display:inline-block; padding:2px 8px; border-radius:999px; border:1px solid #ddd; font-size:12px; }
      .topk { display:flex; gap:10px; align-items:center; }
      .topk input[type=number]{ width:88px; padding:6px 8px; border:3px solid var(--bow-border); border-radius:6px; font:14px system-ui; }
      .topk input[type=range]{ width:55%; min-width:260px; }
      #plotWrap { min-height:220px; }
      table.tbl { width:100%; border-collapse:collapse; }
      table.tbl th, table.tbl td { border-bottom:1px solid #eee; padding:6px 8px; text-align:left; }
      table.tbl th { border-bottom:1px solid #ddd; }
      .mono { font-family: ui-monospace, SFMono-Regular, Menlo, Consolas, monospace; }
      details.stopbox > summary { cursor:pointer; user-select:none; font-weight:600; }
      textarea.small { width:100%; min-height:72px; font:12px/1.45 system-ui; }
      .toolbar { display:flex; gap:8px; align-items:center; flex-wrap:wrap; }
      .btn { border:1px solid #d1d5db; background:#f9fafb; border-radius:8px; padding:6px 10px; cursor:pointer; }
      .btn:active { transform: translateY(1px); }
      .muted { color:#6b7280; }
      .empty { color:#666; font-size:13px; }
      @media (max-width: 980px){ .layout{ grid-template-columns: 1fr; } .topk input[type=range]{ width:100%; } }
    </style>

    <div class="layout">
      <!-- Left: Controls -->
      <div class="card" id="controls">
        <div class="row"><span class="pill">Bag-of-Words Lab</span></div>

        <div class="label">Input text</div>
        <textarea id="txt" class="textarea" spellcheck="false"
          placeholder="Type/paste English text here..."></textarea>

        <div class="label" style="margin-top:10px;">Normalization</div>
        <div id="normRow" class="h-radio"></div>

        <div class="label" style="margin-top:10px;">Filter out</div>
        <div class="hint">Enter words separated by spaces (e.g., the a an of to and ...)</div>
        <div class="field"><input id="stopInp" type="text" placeholder="the a an of to and ..." /></div>

        <details class="stopbox" style="margin-top:10px;">
          <summary>Stopwords Menu <span class="hint">(quick presets & custom)</span></summary>
          <div class="row" style="margin-top:8px; align-items:flex-start;">
            <div style="flex:1;">
              <div class="hint">Preset EN stopwords</div>
              <textarea id="presetEN" class="small"></textarea>
            </div>
            <div style="flex:1;">
              <div class="hint">Custom stopwords (space/comma/newline)</div>
              <textarea id="customSW" class="small" placeholder="add your own..."></textarea>
            </div>
          </div>
          <div class="toolbar" style="margin-top:6px;">
            <button id="applySW" class="btn">Apply stopwords</button>
            <span id="swInfo" class="muted"></span>
          </div>
        </details>

        <div class="label" style="margin-top:10px;">Top-K</div>
        <div class="topk">
          <input id="kNum" type="number" value="15" min="1" max="200" step="1" />
          <input id="kRng" type="range" value="15" min="1" max="200" step="1" />
          <span class="muted">words</span>
        </div>

        <div class="toolbar" style="margin-top:10px;">
          <button id="rebuild" class="btn">Rebuild</button>
          <button id="reset" class="btn">Reset sample</button>
          <span id="status" class="muted"></span>
        </div>
      </div>

      <!-- Right: Results -->
      <div class="card">
        <div class="row"><span class="pill">Results</span></div>
        <div id="plotWrap" style="margin-top:6px;"></div>

        <div class="toolbar" style="margin-top:10px;">
          <label class="h-radio" style="gap:8px;">
            <input id="showTbl" type="checkbox" checked />
            <span>Show table</span>
          </label>
          <button id="copyCSV" class="btn">Copy CSV</button>
          <button id="dlCSV" class="btn">Download CSV</button>
          <span id="meta" class="muted"></span>
        </div>
        <div id="tableWrap" style="margin-top:8px;"></div>
      </div>
    </div>
  </div>`;

  // ---- Controls refs ----
  const txt = box.querySelector("#txt");
  const normRow = box.querySelector("#normRow");
  const stopInp = box.querySelector("#stopInp");
  const presetEN = box.querySelector("#presetEN");
  const customSW = box.querySelector("#customSW");
  const applySW = box.querySelector("#applySW");
  const swInfo   = box.querySelector("#swInfo");
  const kNum = box.querySelector("#kNum");
  const kRng = box.querySelector("#kRng");
  const rebuildBtn = box.querySelector("#rebuild");
  const resetBtn   = box.querySelector("#reset");
  const status     = box.querySelector("#status");
  const plotWrap = box.querySelector("#plotWrap");
  const showTbl = box.querySelector("#showTbl");
  const copyCSV = box.querySelector("#copyCSV");
  const dlCSV   = box.querySelector("#dlCSV");
  const meta    = box.querySelector("#meta");
  const tableWrap = box.querySelector("#tableWrap");

  // ---- Normalization radios ----
  function radio(name, items, value){
    const wrap = document.createElement("div");
    wrap.className = "h-radio";
    items.forEach(v => {
      const id = `${name}-${v}`;
      const lab = html`<label for="${id}" style="display:inline-flex;gap:6px;align-items:center;">
        <input type="radio" name="${name}" id="${id}" value="${v}" ${v===value?'checked':''}/>
        <span>${v}</span>
      </label>`;
      wrap.append(lab);
    });
    return wrap;
  }
  const normWidget = radio("norm", ["none","stem","lemma"], "none");
  normRow.append(normWidget);

  function normValue(){
    const el = normWidget.querySelector("input:checked");
    return el ? el.value : "none";
  }

  // ---- Preset EN stopwords ----
  const PRESET_EN = "a an the and or of to in on for with at from by is am are was were be been being it its as that this these those not very really so too just have has had do does did can could should would will about into over than then out up down more most less least again only also if when while which who whom what why how all any each other some no yes ever even".split(/\s+/);
  presetEN.value = PRESET_EN.join(" ");

  // ---- State ----
  let STOP = new Set();
  let rowsAll = [];   // [{word,count}]
  let rowsTop = [];   // top-K applied

  // ---- Sample text & reset ----
  const sample = [
    "I absolutely love this product. It’s incredibly easy to use and the design is delightful!",
    "But the battery is not great, and the app sometimes feels slow.",
    "Great value for money and super easy to use; onboarding is confusing in parts.",
    "Customer support was helpful and incredibly quick to respond."
  ].join("\n");
  function resetSample(){ txt.value = sample; }
  resetSample();

  // ---- Utils ----
  function tokenize(s){
    return (s||"").toLowerCase().match(/[a-z]+/g) ?? [];
  }
  function stem(w){
    let s = (w||"").toLowerCase().replace(/['’]s?$/, "");
    if (!s) return s;
    const rep = [
      [/sses$/, "ss"], [/ies$/, "y"], [/s$/, ""],
      [/ingly$/, ""], [/edly$/, ""], [/ing$/, ""], [/ed$/, ""],
      [/ational$/, "ate"], [/tional$/, "tion"], [/izer$/, "ize"],
      [/isation$/, "ize"], [/fulness$/, "ful"], [/ousness$/, "ous"],
      [/iveness$/, "ive"], [/ment$/, ""], [/ness$/, ""], [/able$/, ""],
      [/ible$/, ""], [/al$/, ""], [/er$/, ""], [/est$/, ""], [/ly$/, ""]
    ];
    for (const [re, r] of rep) s = s.replace(re, r);
    s = s.replace(/([b-df-hj-np-tv-z])\1$/, "$1");
    s = s.replace(/(xes|ches|shes|sses|zes)$/, () => s.slice(0, -2));
    return s;
  }
  const lemma = (() => {
    const irr = new Map(Object.entries({
      am:"be", is:"be", are:"be", was:"be", were:"be", been:"be",
      has:"have", had:"have", does:"do", did:"do", done:"do",
      went:"go", gone:"go", ran:"run", running:"run",
      ate:"eat", eaten:"eat", saw:"see", seen:"see",
      bought:"buy", brought:"bring", thought:"think",
      better:"good", best:"good", worse:"bad", worst:"bad",
      children:"child", men:"man", women:"woman",
      mice:"mouse", geese:"goose", feet:"foot", teeth:"tooth", people:"person"
    }));
    return function (w){
      if (!w) return "";
      let s = String(w).toLowerCase().replace(/['’]s?$/, "");
      if (irr.has(s)) return irr.get(s);
      if (/(^.{3,})ies$/.test(s)) return s.slice(0, -3) + "y";
      if (/(xes|ches|shes|sses|zes)$/.test(s)) return s.slice(0, -2);
      if (/s$/.test(s) && !/ss$/.test(s)) s = s.slice(0, -1);
      if (/(^.{3,})ied$/.test(s)) return s.slice(0, -3) + "y";
      if (/([b-df-hj-np-tv-z])\1ed$/.test(s)) return s.slice(0, -3);
      if (/ed$/.test(s) && s.length > 3) s = s.replace(/ed$/, "");
      if (/([b-df-hj-np-tv-z])\1ing$/.test(s)) return s.slice(0, -4);
      if (/ing$/.test(s) && s.length > 4) s = s.slice(0, -3);
      if (/(^.{3,})iest$/.test(s)) return s.slice(0, -4) + "y";
      if (/(^.{3,})ier$/.test(s))  return s.slice(0, -3) + "y";
      if (/est$/.test(s) && s.length > 4) s = s.slice(0, -3);
      if (/er$/.test(s)  && s.length > 4) s = s.slice(0, -2);
      if (/ly$/.test(s)  && s.length > 4) s = s.slice(0, -2);
      return irr.get(s) || s;
    };
  })();

  function buildStopSet(){
    const manual = (stopInp.value || "").toLowerCase().match(/[a-z]+/g) ?? [];
    const preset = (presetEN.value || "").toLowerCase().match(/[a-z]+/g) ?? [];
    const custom = (customSW.value || "").toLowerCase().split(/[\s,]+/).filter(Boolean);
    STOP = new Set([...preset, ...manual, ...custom]);
    swInfo.textContent = `Stopwords loaded: ${STOP.size}`;
  }

  // ---- Core pipeline ----
  function process(){
    const tokens = tokenize(txt.value);
    const kept = tokens.filter(w => !STOP.has(w));
    const norm = normValue();
    const final = norm === "stem" ? kept.map(stem)
               : norm === "lemma" ? kept.map(lemma)
               : kept;

    const m = new Map();
    for (const w of final) m.set(w, (m.get(w)||0) + 1);
    rowsAll = Array.from(m, ([word, count]) => ({word, count}))
                   .sort((a,b)=> b.count - a.count || a.word.localeCompare(b.word));

    const K = +kNum.value || 15;
    rowsTop = rowsAll.slice(0, Math.min(K, rowsAll.length));
  }

  function renderPlot(){
    plotWrap.innerHTML = "";
    if (!rowsTop.length){
      plotWrap.innerHTML = `<div class="empty">No tokens to display. Try removing some stopwords or adding more text.</div>`;
      return;
    }
    const fig = Plot.plot({
      width: plotWrap.clientWidth || 800,
      height: Math.max(220, rowsTop.length * 26),
      marginLeft: 110,
      x: { label: "Count →" },
      y: { domain: rowsTop.map(d=>d.word) },
      marks: [
        Plot.barX(rowsTop, {x:"count", y:"word", fill:"#4f46e5"}),
        Plot.text(rowsTop, {x:"count", y:"word", text: d=>d.count, dx:6, textAnchor:"start", fill:"#111"})
      ]
    });
    plotWrap.append(fig);
  }

  function renderTable(){
    tableWrap.innerHTML = "";
    if (!showTbl.checked) { tableWrap.style.display = "none"; return; }
    tableWrap.style.display = "";

    const tbl = html`<table class="tbl">
      <thead><tr><th>word</th><th style="text-align:right">count</th></tr></thead>
      <tbody></tbody>
    </table>`;
    const tb = tbl.querySelector("tbody");
    for (const r of rowsAll){
      tb.insertAdjacentHTML("beforeend",
        `<tr><td class="mono">${r.word}</td><td class="mono" style="text-align:right">${r.count}</td></tr>`
      );
    }
    tableWrap.append(tbl);
  }

  function syncMeta(){
    meta.textContent = `Vocab: ${rowsAll.length} • Showing top-K: ${rowsTop.length}`;
  }

  function toCSV(rows){
    const esc = v => `"${String(v).replace(/"/g,'""')}"`;
    return ["word,count"].concat(rows.map(r=>`${esc(r.word)},${r.count}`)).join("\n");
  }

  function rebuild(){
    buildStopSet();
    process();
    renderPlot();
    renderTable();
    syncMeta();
    status.textContent = "Updated";
    setTimeout(()=> status.textContent = "", 800);
  }

  // ---- Wire up ----
  // sync K number/range
  function syncKFromNum(){ kRng.value = kNum.value; rebuild(); }
  function syncKFromRng(){ kNum.value = kRng.value; rebuild(); }
  kNum.addEventListener("input", syncKFromNum);
  kRng.addEventListener("input", syncKFromRng);

  // auto rebuild on change (debounced)
  const debounce = (fn, ms=120)=>{ let t; return (...a)=>{ clearTimeout(t); t=setTimeout(()=>fn(...a),ms); }; };
  const schedule = debounce(rebuild, 120);

  txt.addEventListener("input", schedule);
  stopInp.addEventListener("input", schedule);
  normWidget.addEventListener("input", rebuild);
  applySW.addEventListener("click", rebuild);
  showTbl.addEventListener("change", renderTable);
  rebuildBtn.addEventListener("click", rebuild);
  resetBtn.addEventListener("click", () => { resetSample(); rebuild(); });
  window.addEventListener("resize", debounce(()=>{ renderPlot(); }, 80), {passive:true});

  // CSV actions
  copyCSV.addEventListener("click", async () => {
    const csv = toCSV(rowsAll);
    try { await navigator.clipboard.writeText(csv); copyCSV.textContent = "Copied!"; setTimeout(()=>copyCSV.textContent="Copy CSV",800); }
    catch { alert(csv); }
  });
  dlCSV.addEventListener("click", () => {
    const csv = toCSV(rowsAll);
    const blob = new Blob([csv], {type:"text/csv;charset=utf-8"});
    const url = URL.createObjectURL(blob);
    const a = document.createElement("a");
    const ts = new Date().toISOString().slice(0,19).replace(/[:T]/g,"-");
    a.href = url; a.download = `bag_of_words-${ts}.csv`; document.body.appendChild(a); a.click(); a.remove();
    URL.revokeObjectURL(url);
  });

  // ---- First build ----
  rebuild();

  return box;
})()

Interactive Word Cloud (demo)

(async () => {
  const box = html`<div style="max-width:1200px;font:14px system-ui;">
    <style>
      .layout { display:grid; grid-template-columns: 360px 1fr; gap:16px; }
      .card { background:#fff; border:1px solid #e5e7eb; border-radius:12px; padding:12px; }
      .row { display:flex; gap:10px; align-items:center; flex-wrap:wrap; }
      .pill { display:inline-block; padding:2px 8px; border-radius:999px; font:12px system-ui; border:1px solid #ddd; }
      #cloudWrap { position:relative; width:100%; min-height:560px; border:1px dashed #ddd; border-radius:12px; overflow:hidden; background:#fafafa; }
      .token { position:absolute; cursor:pointer; user-select:none; white-space:nowrap; transition: transform .06s ease-out, opacity .2s; }
      .token:hover { outline:1px dashed rgba(0,0,0,.25); outline-offset:2px; }
      .kwic { font:13px/1.5 system-ui; }
      .kwic b { background: #fff3b0; padding:0 2px; border-radius:3px; }
      .hint { color:#666; font-size:12px; }
      .empty { position:absolute; inset:0; display:flex; align-items:center; justify-content:center; color:#666; font-size:13px; }
      details.stopbox > summary { cursor:pointer; user-select:none; }
      textarea.small { width:100%; min-height:80px; font:12px/1.4 system-ui; }
      .badge { font:11px system-ui; border:1px solid #ddd; padding:1px 6px; border-radius:999px; }
    </style>

    <div class="layout">
      <div class="card">
        <div class="row"><span class="pill">Word Cloud Controls</span></div>

        <div style="margin-top:8px">
          <label>Language</label>
          <div class="row" id="langRow"></div>

          <label style="display:block;margin-top:8px">N-gram</label>
          <div class="row" id="ngRow"></div>

          <div style="margin-top:8px" id="txtRow"></div>

          <div style="margin-top:8px">
            <b>Options</b>
            <div class="row" id="optRow"></div>
            <div style="margin-top:6px" id="rngRow"></div>
            <div style="margin-top:6px" id="rngRow2"></div>
          </div>

          <details class="stopbox" style="margin-top:10px">
            <summary><b>Stopwords Menu</b> <span class="hint">(จัดการคำฟังก์ชัน/ตัวเชื่อม เช่น is, am, are, the, a ฯลฯ)</span></summary>
            <div style="margin-top:8px" id="stopCtrl"></div>
          </details>

          <div class="row" style="margin-top:10px" id="btnRow"></div>
          <div class="hint" style="margin-top:6px">
            Tip: ถ้าแน่นเกินไป ลองลด Max words หรือเพิ่ม Min font
          </div>
        </div>
      </div>

      <div class="card">
        <div class="row"><span class="pill">Interactive Word Cloud</span></div>
        <div id="stats" style="margin-top:6px"></div>
        <div id="cloudWrap" style="margin-top:8px"></div>
      </div>
    </div>

    <div class="card" style="margin-top:16px">
      <div class="row"><span class="pill">KWIC (Key Word In Context)</span></div>
      <div id="kwic" class="kwic" style="margin-top:8px"></div>
    </div>
  </div>`;

  // -------- Controls --------
  const langSel = Inputs.radio(["English","Thai"], {value:"English"});
  const ngSel   = Inputs.radio(["unigram","bigram","trigram"], {value:"unigram"});

  const sampleTxt = [
    "I absolutely love this product. It’s incredibly easy to use and the design is delightful!",
    "But the battery is not great, and the app sometimes feels slow.",
    "บริการรวดเร็วมาก ทีมงานตอบไว ประทับใจสุด ๆ ใช้งานง่าย",
    "แต่ราคาค่อนข้างแพง และบางครั้งก็มีอาการค้าง ไม่ค่อยดีเท่าไหร่"
  ].join("\n");

  const txtArea   = Inputs.textarea({label:"Text", value: sampleTxt, rows:12});
  const caseFold  = Inputs.toggle({label:"Lowercase (EN)", value:true});
  const rmStop    = Inputs.toggle({label:"Remove stopwords", value:true});
  const stripShort= Inputs.toggle({label:"Remove short tokens (≤2 letters)", value:false});
  const rotateOpt = Inputs.select(["none","±30°","±60°","random"], {label:"Rotation", value:"±30°"});
  const scaleSel  = Inputs.select(["sqrt","linear","log"], {label:"Size scale", value:"sqrt"});

  const maxWords  = Inputs.range([50, 2000], {label:"Max words", value:300, step:50}); // ขยายได้ถึง 2,000
  const minFreq   = Inputs.range([1, 50],   {label:"Min frequency", value:1, step:1});
  const minFont   = Inputs.range([8, 36],   {label:"Min font (px)", value:12, step:1});
  const maxFont   = Inputs.range([28, 160], {label:"Max font (px)", value:80, step:2});

  const runBtn     = Inputs.button("Build cloud");
  const shuffleBtn = Inputs.button("Shuffle layout");

  box.querySelector("#langRow").append(langSel);
  box.querySelector("#ngRow").append(ngSel);
  box.querySelector("#txtRow").append(txtArea);
  box.querySelector("#optRow").append(caseFold, rmStop, stripShort, rotateOpt, scaleSel);
  box.querySelector("#rngRow").append(maxWords, minFreq);
  box.querySelector("#rngRow2").append(minFont, maxFont);
  box.querySelector("#btnRow").append(runBtn, shuffleBtn);

  const stats = box.querySelector("#stats");
  const cloudWrap = box.querySelector("#cloudWrap");
  const kwicBox = box.querySelector("#kwic");

  // -------- Stopwords Base & Menu --------
  const STOP_EN_BASE = new Set(("a,an,the,and,or,of,to,in,on,for,with,at,from,by,is,am,are,was,were,be,been,being,it,its,as,that,this,these,those,not,very,really,so,too,just,have,has,had,do,does,did,can,could,should,would,will,about,into,over,than,then,out,up,down,more,most,less,least,again,only,also,if,when,while,which,who,whom,what,why,how,all,any,each,other,some,no,yes,ever,even").split(","));
  const STOP_TH_BASE = new Set(("และ,หรือ,ของ,ที่,ได้,ใน,บน,ให้,กับ,จาก,ว่า,ก็,ค่ะ,ครับ,นะ,น่ะ,เลย,มาก,สุดๆ,ๆ,ก็ได้,อีก,ยัง,จึง,เพราะ,แต่,เมื่อ,ซึ่ง,คือ,เป็น,ได้ว่า,ได้ไหม,โดย,อยู่,ไป,มา,แล้ว,ด้วย,หรือไม่,ไม่,ไม่ได้").split(","));

  const useDefaultEN = Inputs.toggle({label:"Use default EN stopwords", value:true});
  const useDefaultTH = Inputs.toggle({label:"Use default TH stopwords", value:true});
  const customEN = Inputs.textarea({label:"Custom EN stopwords (comma/space/newline)", rows:4, value:""});
  const customTH = Inputs.textarea({label:"Custom TH stopwords (คั่นด้วยเว้นวรรค/จุลภาค/ขึ้นบรรทัดใหม่)", rows:4, value:""});
  const applyStop = Inputs.button("Apply stopwords");

  const stopCtrl = box.querySelector("#stopCtrl");
  stopCtrl.append(
    html`<div class="row"><span class="badge">English</span></div>`,
    useDefaultEN, customEN,
    html`<div class="row" style="margin-top:6px"><span class="badge">Thai</span></div>`,
    useDefaultTH, customTH,
    html`<div class="row" style="margin-top:8px">${applyStop}</div>`,
    html`<div class="hint" id="stopInfo" style="margin-top:6px"></div>`
  );
  const stopInfo = box.querySelector("#stopInfo");

  function parseCustomList(text){
    return new Set(text.split(/[\s,]+/).map(s=>s.trim()).filter(Boolean));
  }

  let STOP_EN = new Set(STOP_EN_BASE);
  let STOP_TH = new Set(STOP_TH_BASE);
  function rebuildStop(){
    const addEN = parseCustomList(customEN.value);
    const addTH = parseCustomList(customTH.value);
    STOP_EN = new Set(useDefaultEN.value ? [...STOP_EN_BASE, ...addEN] : [...addEN]);
    STOP_TH = new Set(useDefaultTH.value ? [...STOP_TH_BASE, ...addTH] : [...addTH]);
    stopInfo.innerHTML = `EN stopwords: <b>${STOP_EN.size}</b> • TH stopwords: <b>${STOP_TH.size}</b>`;
  }
  applyStop.addEventListener("click", () => { rebuildStop(); buildCloud(false); });
  rebuildStop();

  // -------- Helpers --------
  const palette = ["#4E79A7","#F28E2B","#E15759","#76B7B2","#59A14F","#EDC949","#AF7AA1","#FF9DA7","#9C755F","#BAB0AC"];

  function extent(arr){
    if (!arr.length) return [0,1];
    let mn=arr[0], mx=arr[0];
    for (const v of arr){ if(v<mn) mn=v; if(v>mx) mx=v; }
    return [mn,mx];
  }

  function makeScale(kind, domain, range){
    const [d0,d1] = domain, [r0,r1] = range;
    if (d1 === d0) return () => (r0 + r1) / 2;
    if (kind === "linear"){
      const m = (r1-r0)/(d1-d0); return v => r0 + m*(v - d0);
    }
    if (kind === "log"){
      const a = Math.max(1e-9, d0), b = Math.max(a*1.000001, d1);
      const la = Math.log(a), lb = Math.log(b);
      const m = (r1-r0)/(lb-la); return v => r0 + m*(Math.log(Math.max(a, v)) - la);
    }
    // sqrt
    const m = 1/Math.sqrt(d1 - d0);
    return v => r0 + (r1 - r0) * Math.sqrt(Math.max(0, v - d0)) * m;
  }

  // -------- Tokenization --------
  function tokenize(text, lang){
    if (lang === "Thai"){
      return text.replace(/[“”"(),.!?:;[\]\-—]/g, " ")
                 .split(/\s+/).map(t=>t.trim()).filter(Boolean);
    }
    return text.toLowerCase()
      .replace(/[^a-z0-9\s'-]/g, " ")
      .split(/\s+/).map(t=>t.trim()).filter(Boolean);
  }

  function buildNgrams(tokens, n){
    if (n===1) return tokens;
    const grams = [];
    for (let i=0;i<=tokens.length-n;i++){
      grams.push(tokens.slice(i,i+n).join(" "));
    }
    return grams;
  }

  // -------- Frequency + filtering --------
  function freqCount(text, lang, ngram, doLower, removeStop){
    let t = text;
    if (doLower && lang==="English") t = t.toLowerCase();
    let tokens = tokenize(t, lang);

    if (stripShort.value && lang==="English"){
      tokens = tokens.filter(w => w.length > 2); // ตัดคำสั้นมาก ๆ
    }

    const n = ngram==="unigram" ? 1 : (ngram==="bigram" ? 2 : 3);
    const grams = buildNgrams(tokens, n);

    const stop = (lang==="Thai") ? STOP_TH : STOP_EN;
    const f = new Map();
    for (const g of grams){
      if (removeStop && n===1 && stop.has(g)) continue;
      if (removeStop && n>1){
        const parts = g.split(" ");
        if (parts.every(w => stop.has(w))) continue;
      }
      f.set(g, (f.get(g)||0)+1);
    }
    return f;
  }

  // -------- Layout (spiral + collision) --------
  function rand(seed=Date.now()){
    let s = seed >>> 0;
    return function(){
      s = Math.imul(1664525, s) + 1013904223 | 0;
      return (s>>>0) / 4294967296;
    };
  }

  function placeWords(words, W, H, rotateMode, rng, maxTrials=3500){
    const placed = [];
    const ctx = document.createElement("canvas").getContext("2d");

    function measure(t, fontPx){
      ctx.font = `${Math.round(fontPx)}px system-ui, -apple-system, Segoe UI, Roboto`;
      const w = ctx.measureText(t).width;
      const h = fontPx;
      return [w, h];
    }
    function pickAngle(){
      if (rotateMode==="none") return 0;
      if (rotateMode==="±30°") return (rng()<0.5 ? -1 : 1) * (Math.PI/6);
      if (rotateMode==="±60°") return (rng()<0.5 ? -1 : 1) * (Math.PI/3);
      const deg = [-90,-60,-30,0,30,60,90][Math.floor(rng()*7)];
      return deg * Math.PI/180;
    }
    function collide(r, others){
      for (const o of others){
        if (r.x + r.w < o.x || o.x + o.w < r.x || r.y + r.h < o.y || o.y + o.h < r.y) continue;
        return true;
      }
      return false;
    }

    const centerX = W/2, centerY = H/2;
    for (const w of words){
      const angle = pickAngle();
      const [w0, h0] = measure(w.text, w.size);
      const cos = Math.cos(angle), sin = Math.sin(angle);
      const wRot = Math.abs(w0*cos) + Math.abs(h0*sin);
      const hRot = Math.abs(w0*sin) + Math.abs(h0*cos);

      let success = false;
      for (let t=0; t<maxTrials; t++){
        const r = 2 + 4 * (t/20);
        const th = t * 0.15;
        const x = centerX + r * Math.cos(th) - wRot/2;
        const y = centerY + r * Math.sin(th) - hRot/2;
        const cand = {x, y, w:wRot+2, h:hRot+2};
        if (x<0 || y<0 || x+wRot>W || y+hRot>H) continue;
        if (!collide(cand, placed.map(p=>p.rect))){
          placed.push({rect:cand, text:w.text, size:w.size, angle, color:w.color});
          success = true;
          break;
        }
      }
      if (!success) {
        // ถ้าวางไม่ได้ ให้ข้าม (กันค้างเมื่อคำเยอะมาก)
      }
      // soft cap เพื่อความเร็ว (รองรับได้หลายร้อยคำ)
      if (placed.length > 1200) break;
    }
    return placed;
  }

  // -------- KWIC --------
  function kwic(text, term, window=30){
    const re = new RegExp(term.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"), "gi");
    const out = [];
    let m;
    while ((m = re.exec(text))!==null){
      const i = m.index, j = re.lastIndex;
      out.push({
        pre: text.slice(Math.max(0, i-window), i),
        hit: text.slice(i, j),
        post: text.slice(j, Math.min(text.length, j+window))
      });
      if (out.length>=30) break;
    }
    return out;
  }

  // -------- Render cloud --------
  let seed = Date.now();

  function buildCloud(shuffle=false){
    if (shuffle) seed = Date.now();

    const text = txtArea.value || "";
    const lang = langSel.value;
    const ngram = ngSel.value;

    const f = freqCount(text, lang, ngram, !!caseFold.value, !!rmStop.value);
    let rows = Array.from(f.entries()).map(([term, count]) => ({term, count}));

    rows = rows.sort((a,b)=> b.count - a.count);
    let used = rows.filter(r => r.count >= +minFreq.value).slice(0, +maxWords.value);

    let note = "";
    if (used.length === 0) {
      used = rows.slice(0, Math.min(+maxWords.value, 500));
      note = ` (fallback: no tokens ≥ Min frequency; showing top ${used.length})`;
    }

    const counts = used.map(d=>d.count);
    const [c0, c1] = extent(counts.length ? counts : [1,1]);
    const scale = makeScale(scaleSel.value, [c0, c1], [+minFont.value, +maxFont.value]);

    const words = used.map((r,i)=>({
      text: r.term,
      size: scale(r.count),
      color: palette[i % palette.length]
    }));

    const W = cloudWrap.clientWidth || 860;
    const H = Math.max(560, cloudWrap.clientHeight || 560);
    const rng = rand(seed);
    const placed = placeWords(words, W, H, rotateOpt.value, rng);

    cloudWrap.innerHTML = "";
    if (placed.length === 0){
      const empty = document.createElement("div");
      empty.className = "empty";
      empty.innerHTML = `No tokens to display. Try lowering <b>Min frequency</b>, turning off <b>Remove stopwords</b>, or increasing <b>Max words</b>.`;
      cloudWrap.append(empty);
    } else {
      for (const w of placed){
        const span = document.createElement("span");
        span.className = "token";
        span.textContent = w.text;
        span.style.left = `${w.rect.x}px`;
        span.style.top  = `${w.rect.y}px`;
        span.style.fontSize = `${Math.round(w.size)}px`;
        span.style.color = w.color;
        span.style.transform = `rotate(${(w.angle*180/Math.PI).toFixed(1)}deg)`;
        span.title = `${w.text}`;
        span.addEventListener("click", () => {
          for (const el of cloudWrap.querySelectorAll(".token")) el.style.opacity = ".35";
          span.style.opacity = "1";
          const kw = kwic(text, w.text, 40);
          kwicBox.innerHTML = kw.length
            ? kw.map(k => `${k.pre.replace(/</g,"&lt;")}<b>${k.hit.replace(/</g,"&lt;")}</b>${k.post.replace(/</g,"&lt;")}`).join("<br>")
            : `<span class="hint">No occurrences found (tokenizer/stopwords may have filtered it).</span>`;
        });
        cloudWrap.append(span);
      }
    }

    stats.innerHTML = `Tokens shown: <b>${placed.length}</b>${note} • Vocab: <b>${rows.length}</b> • Min freq ≥ ${+minFreq.value} • N-gram: <b>${ngram}</b>`;
    kwicBox.innerHTML = `<span class="hint">Click a word to see KWIC (up to 30 hits).</span>`;
  }

  // auto rebuild on change (debounced)
  let timer=null;
  function scheduleBuild(){ clearTimeout(timer); timer=setTimeout(()=>buildCloud(false), 120); }
  [langSel, ngSel, txtArea, caseFold, rmStop, stripShort, rotateOpt, scaleSel, maxWords, minFreq, minFont, maxFont]
    .forEach(el => el.addEventListener("input", scheduleBuild));
  runBtn.addEventListener("click", () => buildCloud(false));
  shuffleBtn.addEventListener("click", () => buildCloud(true));
  window.addEventListener("resize", scheduleBuild, {passive:true});

  // first render
  buildCloud(false);
  return box;
})()

\(~~~~~~~~~~\)Text Mining\(~~~~~~~~~~\) การทำเหมืองข้อความ

วัตถุประสงค์การเรียนรู้ (Learning Objectives)

อะไรคือการทำเหมืองข้อความ

การประยุกต์ใช้การทำเหมืองข้อความ (Application of Text Mining)

ด้านธุรกิจ (Business)

ด้านสาธารณสุข (Healthcare)

✅ สถานการณ์

🔑 กระบวนการ

ด้านการเงิน (Finance)

✅ สถานการณ์

🔑 วิธีการทำงานของ Text Mining ในภาคการเงิน

ด้านการศึกษาและการวิจัย (Education & Research)

✅ สถานการณ์

🔑 วิธีการทำงานของ Text Mining ในการศึกษาและการวิจัย

การประมวลผลภาษาธรรมชาติ (Natural Language Processing)

การวิเคราะห์อารมณ์ (Sentiment Analysis)

ตัวอย่างประโยคและค่าคะแนนอารมณ์

การวิเคราะห์อารมณ์แบบมาตรฐาน (Standard Sentiment Analysis: SSA)

2. การวิเคราะห์อารมณ์เชิงละเอียด (Fine-grained Sentiment Analysis – SSA Upgrade)

3. การตรวจจับอารมณ์ (Emotion Detection)

4. การวิเคราะห์อารมณ์ตามแง่มุม (Aspect-Based Sentiment Analysis: ABSA)

สรุป (Summary)

Interactive Sentiment Analysis (Demo)

Workflow of Sentiment Analysis

ขั้นตอนการเตรียมข้อความ (Preprocessing Steps): การทำความสะอาด การทำให้เป็นมาตรฐาน และการจัดโครงสร้าง

🔎 การสกัดคุณลักษณะ (Feature Extraction)

Model

ผลลัพธ์และการแสดงภาพข้อมูล

🔹 ผลลัพธ์หลัก (Key Outputs)

📊 เทคนิคการแสดงผล (Visualization Techniques)

Interactive Bag of Words

Interactive Word Cloud (demo)

\(\)Text Mining\(\)
การทำเหมืองข้อความ