Non Programming Software for Data Mining

Somsak Chanaim

International College of Digital Innovation, CMU

June 18, 2025

All Software Tools Used in This Course

Excel

Microsoft Excel is a spreadsheet program developed by Microsoft, used for storing, analyzing, and visualizing data in tabular form.

What Can Excel Be Used For?

  • Data Management: Store data in rows and columns

  • Mathematical Calculations: Use built-in formulas and functions such as SUM, AVERAGE, and IF

  • Charts & Graphs: Visualize data using bar charts, line charts, and more

  • Data Analysis: Tools like PivotTables, Data Validation, and Conditional Formatting support in-depth analysis

  • Automation: Automate repetitive tasks using Macros and VBA (Visual Basic for Applications)

What are Excel file extensions?

  • .xlsx: Standard Excel file format

  • .xls: Legacy Excel format (pre-2007)

  • .csv: Plain text format with comma-separated values

Jamovi

Jamovi is an open-source statistical analysis software designed to be user-friendly—similar to SPSS but free and highly powerful.

Key Features of Jamovi

  • User-Friendly Interface: Spreadsheet-style interface similar to Excel and SPSS

  • Supports Statistical Analysis: Includes t-tests, ANOVA, regression, chi-square tests, and more

  • Data Visualization: Supports bar charts, scatter plots, histograms, and other visualizations

  • R Integration: Extendable with R code via the Rj module

  • Free and Open-Source: No licensing fees required

What file types does Jamovi support?

  • .omv: Jamovi project file

  • Can import .csv, .xlsx, .sav (SPSS), and .txt files

Orange Data mining

Orange is an open-source software for data analysis and data mining, featuring a simple drag-and-drop interface.

Key Features of Orange

  • User-Friendly GUI: Diagram-based interface—no coding required

  • Advanced Data Analysis: Supports clustering, principal component analysis (PCA), classification, and more

  • Machine Learning Support: Includes models like decision trees, SVM, and neural networks

  • Data Visualization: Tools for attractive plots like scatter plots, heatmaps, and box plots

  • Python Scripting Support: Works with scikit-learn and pandas

  • Free and Open-Source: Available on Windows, macOS, and Linux

What file formats does Orange support?

  • .csv, .xlsx – Tabular data

  • .tab, .txt – Text files

  • Connects to SQL databases

Programming Tools to Boost Your Data Skills

R and Python are two of the most popular programming languages for data analysis, data science, and artificial intelligence (AI/ML).

Each language has its own strengths and advantages depending on the task.

What is R?

R is a programming language specifically designed for statistics and data analysis. It is widely used in statistical computing, data science, and academic research.

Key Features of R

Ideal for Statistics and Data Analysis: Includes comprehensive statistical packages like ggplot2, dplyr, tidyverse, and caret.

Beautiful Data Visualization: Easily create elegant plots using ggplot2 and interactive visuals with plotly.

Supports Machine Learning & AI: Popular libraries include caret, mlr, and randomForest.

Widely Used in Research and Academia: Commonly adopted in economics, social sciences, and biostatistics.

Dynamic Reporting Capabilities: Integrates seamlessly with Quarto, R Markdown, and Shiny for interactive reports and dashboards.

File Extensions in R

  • .R: R script file

  • .Rmd: R Markdown file

  • .qmd: Quarto Markdown file

You can run R code

Python

What is Python?

Python is a highly popular programming language known for its versatility. It is widely used in Data Science, AI/ML, Web Development, and Automation.

Key Features of Python

Easy to Read and Use Python has a simple syntax that is beginner-friendly and readable.

Well-Suited for Machine Learning & AI Popular libraries include scikit-learn, TensorFlow, and PyTorch.

Supports Powerful Data Analysis Common tools: pandas, numpy, matplotlib, seaborn.

Great for Web Development Frameworks include Flask, Django, and FastAPI.

Extensive Library Ecosystem Covers many domains: image processing (OpenCV), NLP (NLTK, spaCy), and more.

Python File Types

  • .py: Python script

  • .ipynb: Jupyter Notebook file

You can run Python code

R vs Python: A Comparison

Feature R 🟦 Python 🟧
Data Analysis ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
Data Visualization ⭐⭐⭐⭐⭐ ⭐⭐⭐
Statistical Analysis ⭐⭐⭐⭐⭐ ⭐⭐⭐
Machine Learning ⭐⭐⭐ ⭐⭐⭐⭐⭐
Deep Learning ⭐⭐ ⭐⭐⭐⭐⭐
Web Development ⭐⭐⭐⭐⭐
Big Data Integration ⭐⭐ ⭐⭐⭐⭐⭐
Beginner-Friendliness ⭐⭐⭐ ⭐⭐⭐⭐

Which Should You Choose?

  • If your focus is on statistical analysis or academic research → choose R

  • If you’re aiming for Machine Learning, AI, or app development → choose Python

  • If you want both → you can integrate R and Python together!