This Spring Boot command-line application analyzes PDF documents for grammar issues, typos, performs comparisons between different PDF versions, and includes placeholder checks for layout and semantics. It generates a comprehensive Allure report to visualize the analysis results.
The application is structured with a main PdfAnalyzerRunner that orchestrates tasks performed by specialized classes: TemplatePdfAnalyzerTask, CompiledVsTemplateComparisonTask, and InternalComparisonTask. Allure reporting utilities are centralized in AllureReportUtil.
- Java 17 JDK or newer
- Apache Maven (for building)
- Allure Commandline Tool (for generating HTML reports)
cli/PdfAnalyzerRunner.java: Main CLI orchestrator.cli/AllureReportUtil.java: Static helpers for Allure reporting.cli/TemplatePdfAnalyzerTask.java: Handles analysis of the template PDF.cli/CompiledVsTemplateComparisonTask.java: Handles comparison of compiled PDF vs. template.cli/InternalComparisonTask.java: Handles comparison of sections within the compiled PDF.config/PdfConfiguration.java: Application configurations.model/: Data models (GrammarIssue,ParagraphInfo,WordInfo).services/: Core PDF processing services.
The application.yaml file in src/main/resources/ should contain configurations for PDF processing, highlighting, LanguageTool, and CSV reporting.
To build the executable JAR:
mvn clean packageThis will produce a JAR file in the target/ directory (e.g., pdf-analyzer-0.0.1-SNAPSHOT.jar).
Execute the JAR from your terminal.
Usage:
java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar [templatePdfPath] [compiledPdfPath] [outputDirectoryForFiles]-
No arguments:
- The application will look for
template.pdfandcompiled.pdfin the same directory as the JAR file. - Output files will be saved to a directory named
pdf_analysis_outputcreated in the JAR's directory.
java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar
- The application will look for
-
With specific PDF paths:
java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar "path/to/your/template.pdf" "path/to/your/compiled.pdf"
(Output directory will be
pdf_analysis_outputin the current working directory). -
With specific PDF paths and output directory:
java -jar target/pdf-analyzer-0.0.1-SNAPSHOT.jar "path/to/template.pdf" "path/to/compiled.pdf" "custom_output_folder"
The application creates an allure-results directory in the current working directory (where the command is run), populating it with JSON files required for the Allure report.
After the application finishes:
-
Generate the report: Navigate to the directory where the JAR was run (where
allure-resultswas created) and execute:allure generate allure-results -o allure-report --clean
For a single HTML file report:
allure generate allure-results -o allure-report --clean --single-file
-
Open the report:
allure open allure-report
Or open
allure-report/index.htmlmanually.
- Template PDF Analysis: Grammar, typos, layout (WordInfo CSV), semantics (keyword check).
- Compiled vs. Template Comparison: Full text diff.
- Internal Compiled PDF Consistency: Compares "COPIA EDENRED" vs. "COPIA CLIENTE" (default pages 1-3 vs 4-6) with heuristic filtering.
- Limitation: Filtering and page ranges are basic; may need refinement.
Configure via src/main/resources/application.yaml. TODO comments in the code indicate areas for potential future configuration enhancements.