GitHub - algernon28/pdf-analyzer

This Spring Boot command-line application analyzes PDF documents for grammar issues, typos, performs comparisons between different PDF versions, and includes placeholder checks for layout and semantics. It generates a comprehensive Allure report to visualize the analysis results.

The application is structured with a main PdfAnalyzerRunner that orchestrates tasks performed by specialized classes: TemplatePdfAnalyzerTask, CompiledVsTemplateComparisonTask, and InternalComparisonTask. Allure reporting utilities are centralized in AllureReportUtil.

Prerequisites

Java 17 JDK or newer
Apache Maven (for building)
Allure Commandline Tool (for generating HTML reports)
- Installation: https://allurereport.org/docs/gettingstarted-installation/

Project Structure

cli/PdfAnalyzerRunner.java: Main CLI orchestrator.
cli/AllureReportUtil.java: Static helpers for Allure reporting.
cli/TemplatePdfAnalyzerTask.java: Handles analysis of the template PDF.
cli/CompiledVsTemplateComparisonTask.java: Handles comparison of compiled PDF vs. template.
cli/InternalComparisonTask.java: Handles comparison of sections within the compiled PDF.
config/PdfConfiguration.java: Application configurations.
model/: Data models (GrammarIssue, ParagraphInfo, WordInfo).
services/: Core PDF processing services.

The application.yaml file in src/main/resources/ should contain configurations for PDF processing, highlighting, LanguageTool, and CSV reporting.

Building the Application

To build the executable JAR:

mvn clean package

This will produce a JAR file in the target/ directory (e.g., pdf-analyzer-0.0.1-SNAPSHOT.jar).

Running the Application

Execute the JAR from your terminal.

Usage:

java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar [templatePdfPath] [compiledPdfPath] [outputDirectoryForFiles]

No arguments:
- The application will look for template.pdf and compiled.pdf in the same directory as the JAR file.
- Output files will be saved to a directory named pdf_analysis_output created in the JAR's directory.
```
java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar
```

With specific PDF paths:

java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar "path/to/your/template.pdf" "path/to/your/compiled.pdf"

(Output directory will be pdf_analysis_output in the current working directory).

With specific PDF paths and output directory:

java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar "path/to/template.pdf" "path/to/compiled.pdf" "custom_output_folder"

The application creates an allure-results directory in the current working directory (where the command is run), populating it with JSON files required for the Allure report.

Generating and Viewing the Allure Report

After the application finishes:

Generate the report: Navigate to the directory where the JAR was run (where allure-results was created) and execute:
```
allure generate allure-results -o allure-report --clean
```
For a single HTML file report:
```
allure generate allure-results -o allure-report --clean --single-file
```
Open the report:
```
allure open allure-report
```
Or open allure-report/index.html manually.

Analysis Tasks Reported in Allure

Template PDF Analysis: Grammar, typos, layout (WordInfo CSV), semantics (keyword check).
Compiled vs. Template Comparison: Full text diff.
Internal Compiled PDF Consistency: Compares "COPIA EDENRED" vs. "COPIA CLIENTE" (default pages 1-3 vs 4-6) with heuristic filtering.
- Limitation: Filtering and page ranges are basic; may need refinement.

Configuration

Configure via src/main/resources/application.yaml. TODO comments in the code indicate areas for potential future configuration enhancements.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.allure/allure-2.29.0		.allure/allure-2.29.0
.idea		.idea
src		src
.gitignore		.gitignore
README.md		README.md
analysis.md		analysis.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prerequisites

Project Structure

Building the Application

Running the Application

Generating and Viewing the Allure Report

Analysis Tasks Reported in Allure

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Project Structure

Building the Application

Running the Application

Generating and Viewing the Allure Report

Analysis Tasks Reported in Allure

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages