Skip to content
View 50sotero's full-sized avatar

Highlights

  • Pro

Block or report 50sotero

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
50sotero/README.md

Hey, I'm Victor Sotero

Computer scientist from Brazil, building data systems at scale in Sweden.

Senior Data Engineer based in Gothenburg. Over the past 5+ years I've helped design pipelines that processed over a billion transactions, migrated terabyte-scale platforms to the cloud, and built marketing data systems from scratch for companies with hundreds of locations. I care about clean architecture, data governance, and making complex systems feel simple.


πŸ”­ Where I've been

  • Zeekr Technology Europe β€” Part of the team building an EU-integrated data platform that unites scattered data sources across European operations, helping shape the approach to EU Data Act compliance and governance.
  • JCPenney β€” Played a key role in bringing the Marketing Technology data platform in-house, replacing an outsourced solution. Helped design the pipelines, migrate the data, and keep Big Data processing running smoothly across 650+ store locations on AWS.
  • Adobe β€” Contributed to the migration of terabyte-scale workloads from on-premises Cloudera/Hive to AWS EMR and S3 β€” work that meaningfully reduced operational costs and improved scalability.
  • Banco Inter β€” Helped migrate legacy SQL procedures to PySpark and process billions of CDC transactions via Kafka into Delta Tables for one of Brazil's largest digital banks.

πŸ›  Tech Stack

Languages Python SQL TypeScript

Data Processing PySpark Apache Spark Databricks Delta Lake dbt

Cloud & Infrastructure AWS EMR S3 Redshift Glue Terraform Docker Cloudera Trino

Orchestration & CI/CD Airflow GitLab CI/CD GitHub Actions Oozie

Streaming & Integration Kafka Change Data Capture

Data Architecture Data Lakehouse Data Modeling ETL/ELT


πŸ“Š Code Metrics

Code velocity chart showing 1407 commits, 4.07 commits per day, 5.2k lines changed per day, monthly trend lines, and source LOC language percentages led by TypeScript 49%

Generated with the reusable code metrics SVG generator.

πŸ“Œ What I'm up to

  • Building EU-compliant data platforms at Zeekr with Databricks, Unity Catalog, and Terraform
  • Designing governance and ingestion pipelines for cross-border data operations
  • Exploring AI-driven data augmentation and NLP side projects
  • Always looking for better ways to make data accessible and trustworthy

πŸ“š Background

I started in academia β€” researching educational technologies and NLP for chatbot interactions at university in MaceiΓ³, Brazil. That research background shaped how I think about problems: methodically, with an eye for what the data is actually telling you.

  • Databricks Certified Data Engineer Associate
  • University of Michigan β€” Machine Learning & Data Visualization certifications
  • CSEDU International Conference β€” Published research on educational technologies in medical education

πŸ“« Let's connect

LinkedIn Portfolio Email

Popular repositories Loading

  1. ai_data_augmentor ai_data_augmentor Public

    ai_data_augmentor is a lightweight Streamlit app designed to enhance small datasets by generating synthetic rows using OpenAI. The app allows users to upload a CSV file, specify the number of rows …

    Python 2

  2. chatbot chatbot Public

    Jupyter Notebook

  3. data-science-from-scratch data-science-from-scratch Public

    Forked from joelgrus/data-science-from-scratch

    code for Data Science From Scratch book

    Python

  4. xml_abstract-text_clustering xml_abstract-text_clustering Public

    Jupyter Notebook

  5. gesture_volume_control gesture_volume_control Public

    Python

  6. 50sotero 50sotero Public

    Config files for my GitHub profile.

    Python