• wewbull@feddit.uk
    link
    fedilink
    English
    arrow-up
    3
    ·
    12 hours ago

    I’ve been experimenting with agentic coding the past couple of weeks. The task is to write a data scraper for a report file I get out of a commercial tool I have to use for work.

    It’s a pain of a format because it’s not written with computer parsing in mind. It’s verbose, contains loads of redundant parts, and doesn’t have good delimiters around data. It’s big too. 500MB uncompressed, so we keep them gzip’d.

    All reasons why I don’t want to write the code to do it.

    The model identifies the file format without me saying where it came from, but it sits in this loop:

    • “Let me analyse the input file” - Does various greps, seds, and awks to pull out sections and find patterns in their formatting.
    • “I understand the format enough for now” - and then proceeds to write out a list of rules it’s discovered. This bit is actually quite impressive.
    • “Now I need to draft the data structures the data will go into” - …and it will write some over-decomposed objects. Not out to disk though.
    • “The user says they want a parser, so let me start writing the actual code” … Finally!.. But hang on…
    • “Actually, I need to understand the file format more” - loop to the top.

    It does this for hours.

    The tiny bits of code I’ve actually managed to get out of it are really bad. It’s like the code you’d get back from some race-to-the-bottom offshore software “team” you were forced to work with 10-15 years ago because your boss had found an “amazing opportunity”. In actuality it was somebody’s teenage nepo-hire. Similar adherence to rules and standards too.

    I already have a rough data scraper for this file. It’s a couple of hundred lines of python. I wrote it in an afternoon. It’s not great. It doesn’t get everything I want out. However it exists and is usable. This isn’t an intractable task.