Portfolio performance compatible scraper for hungarian instruments.
Scraping is done every weekday around 18:00 UTC time as a batch job for all data sources and the results are uploaded to pp-data repository.
Primarily features:
- daily price data scraping in a json format per instrument for the below data sources
- optional historical price generation
| Data source name | Spider name | Notes |
|---|---|---|
| Alfa | alfa_nyugdij | Can scrape historical data |
| Allianz | allianz_nyugdij | Can scrape historical data |
| Aranykor | aranykor | Scrapes historical data |
| Bamosz | bamosz | Supports historical scraping with splash |
| Budapest | budapest_nyugdij | Can scrape historical data, scrapes VPF and PPF funds |
| Erste | erste_nyugdij | Can scrape historical data from hand-crafted csv |
| Honved | honved_nyugdij | Can scrape historical data |
| Horizont | horizont_nyugdij | Can scrape historical data |
| MÁK | mak | Scrapes only latest data |
| MÁK | mak_historical | Scrapes historical data from PDF report generator endpoint for a given time range. It uses tesseract OCR for extracting data from the PDF files. Best effort, the OCR makes some mistakes in certain cases for parsing tables |
| MBH | mbh_nyugdij | Can scrape historical data |
| OTP | otp_nyugdij | Can scrape historical data |
| Pannónia | pannonia_nyugdij | Can scrape historical data |
| Szövetség | szovetseg_nyugdij | Scrapes historical data from excel |
For local execution you need to install the following packages.
- Install
sudo apt-get install python3 python3-dev python3-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev python3-venv docker.io tesseract-ocr poppler-utils pip install -r requirements.txt