AI Powered Expense Report Generator

A few months ago, after qwen2.5-vl came out, I hacked together a short python script that runs receipts through the model. Combine this with a pydantic schema, and you have a powerful way to extract information from receipts according to a predefined structure.

While this worked well for me, it would be difficult for any of my colleagues to use at Data41.

I decided that it would be nice to have a web application to handle the receipts for me. Like any stereotypical developer I decided to spend hours automating a task that can be done in 15 minutes, and had a lot of fun along the way.

Here's a quick example of what I built:

Architecture

Recently, I've been seeing a lot of hype around @tanstack/db. It provides some nice primitives around synchronizing server state with client state. All mutations are done on the client and then sync'ed up with the server, as opposed to the traditional "make a POST request -> get result -> update UI with another GET".

The other latest-and-greatest library I've been meaning to try out is the Vercel AI SDK. After working on this project, I've found that the AI SDK offers a best in class developer experience while supporting a multitude of AI providers. I opted for Azure OpenAI for this project, with GPT-5-mini being the model of choice.

For everying else, I used a typical web application stack - Typescript, Next, Prisma (postgres), Better-Auth, and shadcn-ui.

Building the App

In order to keep this article focused on the concepts I want to show the reader, I'll omit some setup and UI boilerplate from the text. However, you can see the whole repository on my Github if you're interested in these details.

With that out of the way, let's talk about collections in @tanstack/db

Collections

Collections are the main primitive provided by @tanstack/db. There are a few different collections provided by @tanstack/db:

I opted for a combination of Query Collections and In Memory Collections. Let's see an example from the application.

Collection Definitions

// @/client-db/collections.ts

export const reportCollection = createCollection(
	queryCollectionOptions({
		queryKey: ["reports"],
		queryFn: async () => {
			return await getReports();
		},
		getKey: (x) => x.id,
		queryClient,
		onInsert: async (tx) => {
			const inserts = tx.transaction.mutations.map((x) => ({ ...x.modified }));
			await createReports(inserts);
		},
		onUpdate: async (tx) => {
			const updates = tx.transaction.mutations.map((x) => ({ ...x.modified }));
			await updateReports(updates);
		},
		onDelete: async (tx) => {
			const keys = tx.transaction.mutations.map((x) => x.key);
			await deleteReports(keys);
		},
	}),
);

If you're used to react-query, you'll feel right at home. You'll see the normal queryKey and queryFn properties, along with some extra handlers for defining a key for the records, as well as basic CRUD operations. Whenever data is modified in the collection, the changes are pushed to one of the three mutation functions to persist the data to the server's database.

We have six collections in our application:

All of these collections can be found in @/client-db/collections.ts. I've omitted them here beacuse they follow the format of the collection above.

Using a collection

Before I show a complex use case for a collection, let's show a basic example from the app. After a receipt has been processed, we allow the user to modify it within the application. This includes its category. I've created a category picker that pulls from the category collection, and writes to it when the user enters a category that doesn't exist yet. Let's see some code.

// @/components/category/select.tsx

// Imports omitted, uses shadcn-ui combobox setup.

type CategorySelectProps = {
	value: string;
	onValueChanged: (s: string) => void;
	children?: React.ReactNode;
};

export function CategorySelect(props: CategorySelectProps) {
	const { children } = props;

	const [open, setOpen] = useState(false);

	return (
		<Popover open={open} onOpenChange={setOpen}>
			<PopoverTrigger>{children ?? <Button></Button>}</PopoverTrigger>
			<PopoverContent className="p-0 w-[200px]">
				<CategoryCommand
					value={props.value}
					setValue={(v) => {
						props.onValueChanged(v);
						setOpen(false);
					}}
				></CategoryCommand>
			</PopoverContent>
		</Popover>
	);
}

export function CategoryCommand(props: { value: string; setValue: (s: string) => void }) {
	const { value, setValue } = props;
	const [search, setSearch] = useState("");
	const { data: categories } = useLiveQuery(
		(q) =>
			q
				.from({ category: categoryCollection })
				.orderBy((x) => x.category.name)
				.where((x) => like(x.category.normalizedName, `%${search.toUpperCase()}%`)),
		[search],
	);

	function handleSearchKeyDown(e: React.KeyboardEvent<HTMLInputElement>) {
		if (e.key === "Enter") {
			e.preventDefault();

			if (search.length === 0) {
				return;
			}

			const newCategoryId = cuid2();
			categoryCollection.insert({
				id: newCategoryId,
				name: search,
				normalizedName: search.toUpperCase(),
				createdAt: new Date(),
				updatedAt: new Date(),
			});

			setValue(newCategoryId);
		}
	}

	return (
		<Command value={value} shouldFilter={false}>
			<CommandInput placeholder="Search Categories" value={search} onValueChange={setSearch} onKeyDown={handleSearchKeyDown} />
			<CommandList>
				<CommandEmpty>No categories found.</CommandEmpty>
				<CommandGroup>
					{categories.map((x) => (
						<CommandItem
							key={x.id}
							value={x.id}
							onSelect={() => {
								setValue(x.id);
							}}
						>
							<CheckIcon className={cn("mr-2 h-4 w-4", value === x.id ? "opacity-100" : "opacity-0")} />
							{x.name}
						</CommandItem>
					))}
				</CommandGroup>
			</CommandList>
		</Command>
	);
}

There are two key pieces that I want to show the reader. The first is how we query the collection.

const { data: categories } = useLiveQuery(
		(q) =>
			q
				.from({ category: categoryCollection })
				.orderBy((x) => x.category.name)
				.where((x) => like(x.category.normalizedName, `%${search.toUpperCase()}%`)),
		[search],
	);

The syntax follows a SQL-like api: we have our select from a specific collection, we do some ordering, and some filtering. The beautiful thing about live-queries is they update as soon as data in the collection is updated, even if those updates come from a different component. You'll also notice that we have a dependency array after the query definition. This follows normal useEffect / useMemo semantics that react devs are familiar with. The query re-runs whenever the search term changes.

Next, we have an example of modifying data in the collection. When the user presses enter from the combobox input, we create a new category.

function handleSearchKeyDown(e: React.KeyboardEvent<HTMLInputElement>) {
		if (e.key === "Enter") {
			e.preventDefault();

			if (search.length === 0) {
				return;
			}

			const newCategoryId = cuid2();
			categoryCollection.insert({
				id: newCategoryId,
				name: search,
				normalizedName: search.toUpperCase(),
				createdAt: new Date(),
				updatedAt: new Date(),
			});

			setValue(newCategoryId);
		}
	}

The insert call modifies the data locally and queues an insert using the insert handler we defined in the collection's definition. This is all it takes to make changes locally and have them automatically propograte across the application and to the server. We create the id for the category on the client, so that there's no drift between the server state and client state. You'll see that pattern often in the source code.

A more involved example

At this point, you might be wondering what all the hype is about. Let's go through a more complex example that shows off the power of @tanstack/db.

The most complex component in the application is the upload form, found in @/components/report/upload-form.tsx. It's admittedly a large component, but it does a few things.

Let's show how this works with another live query example.

const { data: receipts } = useLiveQuery((q) =>
		q
			.from({ receipt: receiptCollection })
			.fullJoin({ receiptExtraction: receiptsPendingExtractionCollection }, ({ receipt, receiptExtraction }) => eq(receipt.id, receiptExtraction.id))
			.leftJoin({ merchant: merchantCollection }, ({ receipt, merchant }) => eq(receipt?.merchantId, merchant.id))
			.leftJoin({ category: categoryCollection }, ({ receipt, category }) => eq(receipt?.categoryId, category.id))
			.where((q) => or(eq(q.receipt?.expenseReportId, initialData.id), eq(q.receiptExtraction?.expenseReportId, initialData.id))),
	);

This time, we're joining a few collections together. We do a full join since a processing receipt may not have a corresponding receipt in the receipts collection, once a receipt is processed we won't have a pending receipt in the in-memory collection. This lets us show both pending receipts and processed receipts in the same view. We also pull some navigation properties for the processed receipts.

Let's put aside the local state for now, and talk about how we integrate the vercel ai sdk with our application.

AI SDK

One of my design goals was to let the user see receipts stream into the application as they're extracted. The AI sdk provides a perfect API for this functionality: streamObject.

streamObject(...) is a function that lets us pass a standard schema and get a stream of objects as values are generated by the model. Here's the schema that we use for extracting our receipts.

export const ExtractReceiptLineItemSchema = z.object({
	description: z.string(),
	sku: z.string().optional(),
	quantity: z.number().default(1.0).optional(),
	unitPrice: z.number(),
	currency: z.string().default("USD").optional(),
});

export const ExtractReceiptSchema = z.object({
	merchant: z.object({
		name: z.string(),
		address: z.string().optional(),
	}),
	date: z.string().refine((x) => !isNaN(new Date(x).getTime())),
	currency: z.string(),
	category: z.string().optional(),
	tax: z.number(),
	tip: z.number(),
	paymentMethod: z.string().optional(),
	confidence: z.number(),
	items: z.array(ExtractReceiptLineItemSchema),
});

It's a fairly straightforward schema. There is another schema that mirrors this one found in the same file, that is essentially a deep partial of that schema. This lets us validate the shape as it's streamed in from the model, but not all properties have to be present on the object for the validation to succeed.

Now, let's look at the streamSchema invokation.

const stream = createStreamableValue<ReceiptExtractionStreamMessage>();

/// ... some other stuff (processing images, pdf, getting data in the right shape for our call)

(async () => {
		const { partialObjectStream } = streamObject({
			model: ai,
			schema: ExtractReceiptSchema,
			output: "object",
			messages: [
				{ role: "system", content: systemPrompt },
				{
					role: "user",
					content: [
						{ type: "text", text: "Please extract receipt information from the attached file(s). Return only the JSON object that matches the schema." },
						...files.map((x) => ({
							type: "image" as const,
							image: Buffer.from(x.buffer),
						})),
					],
				},
			],
		});

		let last: unknown | undefined = undefined;
		for await (const partialObject of partialObjectStream) {
			last = partialObject;
			stream.update({ type: "partial", partial: partialObject });
		}

		const fullReceiptResult = ExtractReceiptSchema.safeParse(last);
		if (fullReceiptResult.success) {
			const container = storage.getContainerClient(`receipts`);

			for (const [index, file] of files.entries()) {
				const blobClient = container.getBlockBlobClient(`${reportId}/${receiptId}/receipt-${index}.${file.extension}`);
				await blobClient.upload(Buffer.from(file.buffer), file.buffer.byteLength);
			}

			const [createdReceipt, createdLineItems, createdMerchant, createdCategory] = await createExpenseReportFromParse(
				fullReceiptResult.data,
				reportId,
				receiptId,
				files.length,
			);

			stream.done({
				type: "finished",
				result: {
					receipt: createdReceipt,
					lineItems: createdLineItems,
					merchant: createdMerchant,
					category: createdCategory,
				},
			});
			return;
		}

		stream.done();
	})();

We first create a streamable value. We return this from the function so the client can receive updates as the model processes the data. We have two event types in this stream: partial updates and a completion update. The partial update just forwards the raw output from the model. We keep track of the last object from the stream, so that we can validate it against the proper result schema at the end.

Once generation has finished, we upload the file to our file storage service (in this case Azure Blob) and create the record in the database. You might be wondering why we do the insert on the backend, when the whole point of this project is to be local first. The reason for this is LLM inference can get expensive, and we don't want the data to get lost if the user refreshes the page before the result is generated.

Luckily, we can return all of the created data from the function and directly write the result to the receipts collection without triggering a CRUD operation again.

Let's look at some front end code.

Consuming the stream

function startReceipt(file: File) {
		const receiptId = cuid2();
		receiptsPendingExtractionCollection.insert({
			id: receiptId,
			expenseReportId,
			status: "PENDING",
			file: file,
			partial: {},
			preview: !file.type.endsWith(".pdf") ? URL.createObjectURL(file) : null,
		});

		(async () => {
			const formData = new FormData();
			formData.set("file", file);
			formData.set("receipt-id", receiptId);
			formData.set("report-id", expenseReportId);
			const { receiptStream } = await extractReceipt(formData);

			receiptsPendingExtractionCollection.update(receiptId, (r) => {
				r.status = "EXTRACTING";
			});

			for await (const message of readStreamableValue(receiptStream)) {
				if (message === undefined) {
					continue;
				}
				switch (message.type) {
					case "partial": {
						const partial = message.partial;
						if (!partial) {
							continue;
						}

						receiptsPendingExtractionCollection.update(receiptId, (r) => {
							if (!r.partial) {
								return;
							}
							r.partial.merchant = partial.merchant;
							r.partial.category = partial.category ?? "Uncategorized";
							r.partial.confidence = partial.confidence;
							r.partial.date = partial.date;
							r.partial.items = partial.items?.map((x) => x!) ?? [];
							r.partial.paymentMethod = partial.paymentMethod;
							r.partial.tax = partial.tax;
						});
						break;
					}
					case "finished": {
						const { receipt, lineItems, merchant, category } = message.result;
						receiptCollection.utils.writeInsert([receipt]);
						receiptLineItemCollection.utils.writeInsert(lineItems);
						if (merchant) {
							merchantCollection.utils.writeInsert(merchant);
						}
						if (category) {
							categoryCollection.utils.writeInsert(category);
						}
						receiptsPendingExtractionCollection.delete(receiptId);

						break;
					}
				}
			}
		})();
	}

Here is the function that invokes the server action we discussed earlier. The first thing we do when a file is uploaded is create a record in our in-memory collection. We also create a receipt id ahead of time, so that we can later join it to the processed receipts collection.

We store the partial receipt in the in-memory collection, and update it as partial updates come through the stream.

When the "finished" event comes in, we directly write the result into the collection using writeInsert. This modifies the collection without invoking the CRUD handlers.

As these updates come through and modify either collection, the data return from the useLiveQuery hook gets updated instaneously and the UI re-renders.

Conclusion

Overall, I'm very pleased with the quality of this application. The two libraries I discussed in this article empower developers to build fast, high-quality applications very quickly with a great developer experience.

I hope you enjoyed reading this and that you learned something. For the full code, check out the Github repository.

engineer @data41 company logo