Skip to content

vivekkdagar/pyquantify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyquantify

Pyquantify is a powerful CLI tool for semantic analysis. It leverages natural language processing to unveil insights from text, files, or websites, empowering sophisticated data visualization and exploration.

Badges

Pyquantify Version

GPLv3 License

Python 3

Table Of Contents

Demo

Features

  1. Text Summarization:

    • Utilizes the BERT model for summarizing text.
    • Provides caching functionality to speed up summarization for previously processed text.
    • Supports exporting summaries to text files.
  2. Text Analysis:

    • Preprocesses text data including tokenization and part-of-speech tagging.
    • Generates various metrics such as character count, word count, sentence count, etc.
    • Analyzes morphological data including lemmatized forms, part-of-speech tags, and word frequencies.
    • Performs sentiment analysis using the TextBlob library.
    • Visualizes data through word clouds and word frequency charts.
  3. Text Processing:

    • Offers functionality for cleaning and preprocessing text data.
    • Implements functions for generating word clouds and word frequency charts.
    • Calculates cosine similarity between two texts.
  4. Data Loading and Exporting:

    • Supports loading text data from raw input, files, or websites.
    • Provides export functionality for analyzed data, summaries, sentiment analysis results, and keywords extracted from text.
  5. CLI Interface:

    • Implements a command-line interface (CLI) using Click library.
    • Offers commands for various text analysis and summarization tasks, including data visualization and sentiment analysis.
    • Provides options for specifying data loading mode and exporting analysis results.
  6. Parallel Processing: Utilizes multiprocessing and concurrent.futures for parallel processing of tasks, improving performance for tasks like sentiment analysis and summarization.

  7. Unit Testing:

    • Includes unit tests for different modules and functionalities using the unittest framework.
    • Uses mocking to isolate and test individual components such as data loading, summarization, and exporting.
  8. Exception Handling and Error Reporting:

    • Handles exceptions gracefully and provides informative error messages.
    • Reports errors such as unsupported operating systems, file not found, and invalid input modes.

Installation

Install pyquantify with pip

  pip install pyquantify

or you can build locally

Clone the project

  git clone https://github.com/vivekkdagar/pyquantify.git

Go to the project directory

  cd pyquantify

Build the package:

  python3 -m build

Install the package:

  pip install dist/*gz

Before running pyquantify

Clone the project

  git clone https://github.com/vivekkdagar/pyquantify.git

Go to the project directory

  cd pyquantify

Run the script nltk_datasets.py in scripts directory

  python3 nltk_datasets.py

Download dataset for spacy

  python3 -m spacy download en_core_web_sm

Usage/Examples

Pyquantify provides several commands for analyzing and visualizing text data. Below is a guide on how to use the key functionalities:

  1. Search for a Specific Word in Morphological Analysis:

    pyquantify search-word --mode [raw/file/website] --word [desired_word]
    • --mode: Specify the data loading mode (raw input, file, or website).
    • --word: Specify the word you want to search for.
  2. Generate Word Frequency Plot:

    pyquantify visualize --mode [raw/file/website] --freq-chart --export
    • --mode: Specify the data loading mode (raw input, file, or website).
    • --freq-chart: Flag to generate word frequency chart.
    • --export: Optional flag to export the frequency plot to a file.
  3. Generate Word Cloud:

    pyquantify visualize --mode [raw/file/website] --wordcloud --export
    • --mode: Specify the data loading mode (raw input, file, or website).
    • --wordcloud: Flag to generate word cloud.
    • --export: Optional flag to export the word cloud to a file.
  4. Text Analysis and Metrics Generation:

    pyquantify analyze --mode [raw/file/website] --n [number_of_rows] --export
    • --mode: Specify the data loading mode (raw input, file, or website).
    • --n: Optional parameter to display a specific number of rows in the analysis.
    • --export: Optional flag to export the analysis results to files.
  5. Summarize Text:

    pyquantify summarize --mode [raw/file/website] --export
    • --mode: Specify the data loading mode (raw input, file, or website).
    • --export: Optional flag to export the summary to a file.
  6. Sentiment Analysis

    pyquantify sentiment-analysis --mode [raw/file/website] --export
    • --mode: Specify the data loading mode (raw input, file, or website).
    • --export: Optional flag to export the summary to a file.
  7. View the pyquantify git page

pyquantify git
  1. Extract keywords from the data
pyquantify keywords --mode [raw/file/website] --export
  • --mode: Specify the data loading mode (raw input, file, or website).
  • --export: Optional flag to export the extracted keywords to a file.
  1. Calculate Cosine Similarity:

    pyquantify similarity --mode [raw/file/website] --other [raw/file/website]
    • --mode: Specify the data loading mode for the first text (raw input, file, or website).
    • --other: Specify the data loading mode for the second text (raw input, file, or website).

Feel free to explore additional options and functionalities by checking the help documentation for each command:

pyquantify [command] --help

FAQ

Q: What is Pyquantify?

Pyquantify is a tool designed for in-depth analysis of textual data, focusing on extracting meaning and linguistic insights. It provides features like word frequency, morphology, and metrics generation, enhancing data exploration and visualization.

Q: Why Develop Pyquantify as a Semantic Profiler?

Pyquantify was created for the DSA subject in the fifth semester of college. The goal was to offer a versatile NLP tool, empowering users to analyze and profile text efficiently. The tool's features aim to deepen understanding and exploration of linguistic aspects within textual data.

Q: Why Did Pyquantify evolve from a Word Frequency Counter?

Originally conceived as a word frequency counter, Pyquantify's development took a different direction. The decision to expand its capabilities was driven by the desire to create a more comprehensive tool for natural language processing. The project evolved to encompass semantic profiling, offering a richer set of features such as morphology analysis, metrics generation, and enhanced data visualization. This shift aimed to provide users with a more powerful and versatile solution for exploring and understanding textual data beyond simple word frequency analysis.

Q: Why the name change from NLPFreq to Pyquantify?

NLPFreq felt limiting and didn't capture the full scope of the project. Pyquantify more accurately reflects its capabilities as a Python-based tool for quantitative data analysis.

Screenshots

App Screenshot

Acknowledgements