Pyquantify is a powerful CLI tool for semantic analysis. It leverages natural language processing to unveil insights from text, files, or websites, empowering sophisticated data visualization and exploration.
- Features
- Installation - Install pyquantify with pip - or you can build locally
- Before running pyquantify
- Usage/Examples
- FAQ
- Screenshots
- Acknowledgements
-
Text Summarization:
- Utilizes the BERT model for summarizing text.
- Provides caching functionality to speed up summarization for previously processed text.
- Supports exporting summaries to text files.
-
Text Analysis:
- Preprocesses text data including tokenization and part-of-speech tagging.
- Generates various metrics such as character count, word count, sentence count, etc.
- Analyzes morphological data including lemmatized forms, part-of-speech tags, and word frequencies.
- Performs sentiment analysis using the TextBlob library.
- Visualizes data through word clouds and word frequency charts.
-
Text Processing:
- Offers functionality for cleaning and preprocessing text data.
- Implements functions for generating word clouds and word frequency charts.
- Calculates cosine similarity between two texts.
-
Data Loading and Exporting:
- Supports loading text data from raw input, files, or websites.
- Provides export functionality for analyzed data, summaries, sentiment analysis results, and keywords extracted from text.
-
CLI Interface:
- Implements a command-line interface (CLI) using Click library.
- Offers commands for various text analysis and summarization tasks, including data visualization and sentiment analysis.
- Provides options for specifying data loading mode and exporting analysis results.
-
Parallel Processing: Utilizes multiprocessing and concurrent.futures for parallel processing of tasks, improving performance for tasks like sentiment analysis and summarization.
-
Unit Testing:
- Includes unit tests for different modules and functionalities using the
unittest
framework. - Uses mocking to isolate and test individual components such as data loading, summarization, and exporting.
- Includes unit tests for different modules and functionalities using the
-
Exception Handling and Error Reporting:
- Handles exceptions gracefully and provides informative error messages.
- Reports errors such as unsupported operating systems, file not found, and invalid input modes.
pip install pyquantify
Clone the project
git clone https://github.com/vivekkdagar/pyquantify.git
Go to the project directory
cd pyquantify
Build the package:
python3 -m build
Install the package:
pip install dist/*gz
Clone the project
git clone https://github.com/vivekkdagar/pyquantify.git
Go to the project directory
cd pyquantify
Run the script nltk_datasets.py in scripts directory
python3 nltk_datasets.py
Download dataset for spacy
python3 -m spacy download en_core_web_sm
Pyquantify provides several commands for analyzing and visualizing text data. Below is a guide on how to use the key functionalities:
-
Search for a Specific Word in Morphological Analysis:
pyquantify search-word --mode [raw/file/website] --word [desired_word]
--mode
: Specify the data loading mode (raw input, file, or website).--word
: Specify the word you want to search for.
-
Generate Word Frequency Plot:
pyquantify visualize --mode [raw/file/website] --freq-chart --export
--mode
: Specify the data loading mode (raw input, file, or website).--freq-chart
: Flag to generate word frequency chart.--export
: Optional flag to export the frequency plot to a file.
-
Generate Word Cloud:
pyquantify visualize --mode [raw/file/website] --wordcloud --export
--mode
: Specify the data loading mode (raw input, file, or website).--wordcloud
: Flag to generate word cloud.--export
: Optional flag to export the word cloud to a file.
-
Text Analysis and Metrics Generation:
pyquantify analyze --mode [raw/file/website] --n [number_of_rows] --export
--mode
: Specify the data loading mode (raw input, file, or website).--n
: Optional parameter to display a specific number of rows in the analysis.--export
: Optional flag to export the analysis results to files.
-
Summarize Text:
pyquantify summarize --mode [raw/file/website] --export
--mode
: Specify the data loading mode (raw input, file, or website).--export
: Optional flag to export the summary to a file.
-
Sentiment Analysis
pyquantify sentiment-analysis --mode [raw/file/website] --export
--mode
: Specify the data loading mode (raw input, file, or website).--export
: Optional flag to export the summary to a file.
-
View the pyquantify git page
pyquantify git
- Extract keywords from the data
pyquantify keywords --mode [raw/file/website] --export
--mode
: Specify the data loading mode (raw input, file, or website).--export
: Optional flag to export the extracted keywords to a file.
-
Calculate Cosine Similarity:
pyquantify similarity --mode [raw/file/website] --other [raw/file/website]
--mode
: Specify the data loading mode for the first text (raw input, file, or website).--other
: Specify the data loading mode for the second text (raw input, file, or website).
Feel free to explore additional options and functionalities by checking the help documentation for each command:
pyquantify [command] --help
Pyquantify is a tool designed for in-depth analysis of textual data, focusing on extracting meaning and linguistic insights. It provides features like word frequency, morphology, and metrics generation, enhancing data exploration and visualization.
Pyquantify was created for the DSA subject in the fifth semester of college. The goal was to offer a versatile NLP tool, empowering users to analyze and profile text efficiently. The tool's features aim to deepen understanding and exploration of linguistic aspects within textual data.
Originally conceived as a word frequency counter, Pyquantify's development took a different direction. The decision to expand its capabilities was driven by the desire to create a more comprehensive tool for natural language processing. The project evolved to encompass semantic profiling, offering a richer set of features such as morphology analysis, metrics generation, and enhanced data visualization. This shift aimed to provide users with a more powerful and versatile solution for exploring and understanding textual data beyond simple word frequency analysis.
NLPFreq felt limiting and didn't capture the full scope of the project. Pyquantify more accurately reflects its capabilities as a Python-based tool for quantitative data analysis.