Learn what it means to be truly AI-native — join our live product demo on Thursday at 10am PT

Learn what it means to be truly AI-native — join our live product demo on Thursday at 10am PT

Fintech

I tried Claude Cowork for procurement—I’m stunned

It goes without saying that there’s a lot of buzz around AI and agents right now. But what are people actually doing with them in supply chain and finance? My team gave AI agents a concerted try, and I’m being honest when I say I don’t think I’ve ever seen this before in my career—a moment when things changed so completely.

Author: Zarina Zayas Ortiz

I live in the San Francisco Bay Area and read four AI newsletters weekly, so I’m starting from a decent amount of context. But what I’ve seen is that Claude Cowork can reliably do the work of whole junior analysts in a way that other models haven’t been able to approach.

In this article, I explain the test we ran and the results. To summarize, Cowork can reliably complete tasks in minutes that used to take an analyst hours. Cowork reviewed my ERP data and wrote a staffing strategy that was insightful enough that our chief operations officer (COO) spent an evening reviewing it. I don’t know what it means for the future of work, and especially junior work, but this is here and available today.

In supply chain, it’s looking like an infrastructure arms race

I work in supply chain and procurement at an energy company that’s using software and robotics to service solar fields and make the industry far more cost efficient. So we have somewhat of a stake in the energy conversation around these technologies. For the past several years, I’ve found myself trying to understand the movement in this space and all the AI tools, and it very much resembles an arms race for infrastructure. The bottleneck for AI companies is no longer models, it's compute. It's power, it's cooling, networking, and supply chain resilience. The big tech companies that are thinking long term are vertically integrating to acquire silicon, data centers, and energy, so they can control their performance and costs.

And yet, as widely as these tools seem to be adopted, we're still in what experts are calling “the productivity phase.” Meaning, we’re typically only asking these tools to do what we already do somewhat faster. I've been encouraging my team and company to think further out. How do we shift from just a cloud abstraction to truly thinking AI native? From chatbots to autonomous systems, from prompting to AI executing the whole workflow, analysis to negotiation for an entire procurement cycle.

That's the theory. But on January 12, 2026, Anthropic finally released Claude Cowork for the Mac. We'd been watching others use it on the PC and eagerly awaiting it. Cowork is a desktop application that, unlike a browser-based chat window, lives on your computer and can complete autonomous work.

Now, of course, that’s too much risk for most people, as evidenced by the fact that even Meta’s Director of AI Alignment had to sprint to unplug her Mac to stop the AI agent OpenClaw from deleting all her emails.

People have every reason to be cautious and cynical around this technology. This is why Cowork can only work within folders where you give it access—a sort of walled garden, which is how we started out. You give minor responsibility, test, and give it more.

Procurement work involves a lot of repetitive tasks. There’s reporting that has to come from an ERP system. There’s opening purchase orders and requisitions, and running spend analyses. You’re typically relying on an analyst either onshore or offshore, and each task might take them between thirty minutes to two hours. By the end of our testing, we had the agent completing those tasks in 1-2 minutes. I’ll explain how we got there.

To test, gradually escalate the access and requests

We’d already learned to run these tests because working within the Microsoft ecosystem, we’ve had access to CoPilot. Early on, I gave it simple tasks in Excel and was surprised to find it couldn’t handle even basic summation. I didn’t try again. I applied the same logic to Cowork.

Create a folder

I gave Cowork access to an Excel dashboard I had to show to my COO. I dropped all my related purchase orders into Claude’s folder and asked it a question to which I already knew the answer:

“What can you tell me about Project A? Which of my purchase orders are based on this file here? Help me understand, based on today’s date, along with delivery dates, what’s the risk?”

Claude thought for 30 seconds, and replied correctly. 

Increase task complexity, try again

I began to give Claude more access and more complex tasks. I planned to do this until we discovered its areas of incompetence, or its “defined edges,” so we knew what to use it for, and what not to use it for. I did not encounter those edges, though to be fair, we are testing as a part-time task, not a full-time job. But I found each successful test exhilarating. 

Ask a credible third-party to judge the results

This past week, I had to run a big analysis for my COO around organizational effectiveness. I both let my COO know I was doing this and I gave Claude a big field promotion, with all the necessary files. I downloaded a bunch of data from our ERP (so as to still keep it isolated), which included purchase orders, purchase requisitions, inventory, and more. I did not attempt to organize any of it.

“You have access to these files. Trying to put together a dashboard for the purpose defined that helped me understand where my staffing gaps are and what my staffing recommendations should be in the next year.” 

This task typically takes an analyst 2-4 hours, including cleaning up the data. Claude gave us an output in five minutes. It asked me one clarifying question, then resumed. 

My COO was actually impressed by the result. He spent some time at night reviewing it again, just going through the different insights, saying, “This is actually pretty useful.” It got us thinking down paths we hadn’t. As far as a “passable” work product, this is something we gladly would have accepted from an analyst.

What about hallucinations?

This may sound strange, but we aren’t finding any errors of invention. We are very detail-oriented people and thorough in our analysis, and there are errors. But the errors are because the model is 100% correct about analyses based on faulty data provided. Where there’s a figure or result that’s off, we tend to trace it back to our product life cycle system, where someone mislabeled products. So far, Claude’s issue is it works too fast off of our inaccurate data.

What this means for the future of work

We have lots more testing to do before we entrust this with anything close to human-level work on important projects. We are not planning to lay anyone off. There is still plenty of work to do around data quality and providing the models that they’d need to work. I would explain the result less as replacement and more as compressing decision cycles from weeks into just hours and minutes.

This is a way to compress decision cycles from weeks into just hours and minutes.

But it does feel like we’re making progress on the question, “What is the exponential leap forward, here?” Our job is to respond quickly, know the market, and be mindful of costs. It’s work we might have outsourced offshore before, and accepted the delay. But do agents now mean the outsourced jobs are coming back, and more focused around enabling said agents? It’s certainly contributed to a culture in our company of saying, “With this rote task, what stopped you from trying AI?” 

A different world of work is coming. We don’t know precisely what it will look like. But I imagine it means our quarterly IT check-ins will happen monthly, if not more frequently. That we’ll have to carefully control our AI and cloud costs. And in supply chain, I will be much less eager to sign long-term contracts than before with LLM model vendors. It’s all changing so quickly.

Next Posts

Business

One big global ERP implementation barrier? Culture

If the goal of an ERP is to scaffold around a company’s workflows and support its far-flung operations, many leaders underestimate how varied those operations actually are. Or how fragile they are when they come into contact with culture.

Transformation

CFO-ing in the polycrisis—an emergency handbook

When multiple global crises converge, the finance function becomes the last line of defense. It has become increasingly difficult to predict, prepare, or govern at a global scale—so the premium on doing so has risen. Dramatically.

Business

The Everest x Ramp integration is now live

Ramp serves over 50,000 customers giving the world true, autonomous expenses and finance—which is what makes Ramp and Everest such logical partners.