Why we need data engineering benchmarks for LLMs
Tools like Copilot and GPT-based copilots promise to reduce the repetitive burden of data engineering tasks, suggest code, and even debug complex pipelines. But how do we measure whether they’re actually good at this?