The Five Stages of AI in Software Engineering
Denial. Anger. Bargaining. Depression. Acceptance. I see it in the comments. Every time someone points out what’s happening in software engineering right now—the quiet collapse of the middle, the rise of AI-powered developers, the shrinking need for entry-level engineers—people resist. I get it. Three years ago, I would have spent an hour on something that […]
Reducing Docker Image sizes with Multi-Stage Builds and Distroless
Imagine you are a Data Engineer at a large company with multiple deployments per day. You’re using Docker images to containerize your ETL jobs, which consume data from an external API and load it into your data warehouse. You’ve noticed that your CI/CD pipeline takes around 15 minutes to complete, as each deployment requires building, […]
Data Engineering in Azure: understand PDFs using LLMs
Dealing with non-structured data is always interesting, especially when it means building solution to parse PDFs. Many companies and individuals use PDF files daily, and PDFs are used to distribute all kind of information: from simple text, to complex tables and diagrams. Over the years, there have been multiple approaches to convert (non-structured) data from […]
Organization Migration in Terraform Cloud
Streamlining Infrastructure Organization Migration in Terraform Cloud Terraform, developed by HashiCorp, is a robust tool for defining and provisioning infrastructure as code (IaC). Using the HashiCorp Configuration Language (HCL), Terraform allows you to specify your desired infrastructure state in configuration files, covering resources like virtual machines, networks, storage, and more. Once configured, Terraform automatically manages […]
Data Builder Dan: Episode 1 – Metadata Mayhem
Metadata Mayhem disrupts data organization and understanding in the digital realm. Dan explores metadata management understanding in an effort to restore clarity and order. In the world of data, it’s not fun unless challenging! Also, check out the blog associated with this episode to dive into Metadata Management, and also a hands-on tutorial with DataHub […]
Volume 1: Metadata Management – Part 2: Deep-dive on Metadata Management with DataHub
Let’s look at implementing DataHub The metadata management options highlighted in our previous blog Volume 1: Metadata Management – Part 1 depend on several considerations, and all may be a great choice for your specific needs. However, let’s choose one tool and dive into how such an implementation may look like. For this purpose, we […]
Volume 1: Metadata Management – Part 1: Understanding & Select Tools
Metadata management is an important part of data governance, but data governance encompasses broader measures that help manage all data assets within an organization. Measures such as setting up data policies, establishing data stewardship / ownership, steps toward data quality, or data privacy and security, to name a few. Metadata management focuses on handling information […]
End-to-end MLOps with Databricks: A hands-on tutorial
Machine Learning (ML) model development does not end with training and validation.
Exploring LLMs in Gaming
Remember those role-playing game (RPG) moments with fixed Non-Playable Characters (NPCs) chats? Well, probably they’ll become part of the past… With more powerful LLMs, a lot more options become available, so I found myself with some questions.
Explainable Machine Learning using SHAP
Models that we put in production need to be explainable: we need to understand how each feature impacts the overall predictions.