ChatGPT Action Authentication

Published in

AWS Tip

7 min readJan 17, 2024

In this installment of random apps to build, we will be creating a custom GPT with an action that has OAuth authentication for a persistent user identity that will delegate to other authentication providers.

Code: https://github.com/garunski/wordswithchat.

Website: https://wordswithchat.com/.

CustomGPT: https://chat.openai.com/g/g-HGmSVHyWf-words-with-chat.

Goal: Create an action for a custom GPT that will allow for user authentication.

Tech Stack: Cloudflare Pages with Functions — fast, simple, serverless, key value storage, free. Google Auth, because a username/password flow is simpler to implement and can be done at any point. Adding the requirements of social login at the start, allows for the flexibility of the solution to any other providers. Also, I strongly discourage everyone from creating yet another authentication provider.

The article assumes familiarity with OAuth flow and OAuth configuration for custom ChatGPT actions.

Initially, the idea was to just use Google OAuth to sign in the user and verify the token on the server side as sent from ChatGPT. But one of the restrictions that I ran into is the requirement that all the URLs must match for the action. So the servers: url must be the one that is configured in the Authentication dialog. Otherwise you get the error for “Authorization URL, Token URL, and API hostname must share a root domain”. This works fine if the goal is to communicate with Google APIs directly but as an authentication for a third party API, it does not. Also, localhost doesn't work as a valid url, so using something like https://ngrok.com/ made it easier to test the API locally. Given the limitation, essentially we need to become the Authentication provider for the Action. In my case, the authentication needs to be proxied to Google and handled server side.

Authorization URL

This is the URL that ChatGPT will redirect to after the user clicks the Sign in button that shows up when it needs to make the call. This will need to be a page where the decision will need to be made about the authentication.

TL;DR(skip analysis and move on to Token URL): The initial page load presents the user with the choice of the auth providers implemented and starts the login flow on the choice selected. Upon successful authentication a code needs to be returned back to ChatGPT which it will then use to POST to the Token URL to get the access token. In my case I am encrypting the user id along with an expire time and returning that as a code.

In the simplest case we can default to redirecting to Google Auth and coming back with token after that flow succeeds. In this system the Auth becomes a true proxy and it’s pretty easy to implement. During the process we would grab the JWT from the Token URI and return it back to ChatGPT. I never tested this method since I wanted to create an auth provider so not sure if all the different urls in the issuer or audience will cause a problem. My guess is probably not. Special case needs to be accounted for when access_token expires and refresh_token is sent back for a new access_token, but this can also be proxied directly to Google.

I decided to make it a little more expandable and build in the flexibility to allow for multiple auth providers. My current implementation is to have a fully static html response without use of javascript but it's kinda silly and only benefit is that the page works with javascript off, which is a negligible concern for a client side bot like ChatGPT. Both the login page and the anchor tag to a static redirect to Google function should be refactored as a static page and use Javascript to simplify the codebase and remove the cost of function invocations. The actual implementation of static html that has environment variable substitution was surprisingly complicated. There are so many very complicated build tools that do wonderfully complex things, and when you just want something small like string replacement they all fall on their face. I tried with Vite but after messing with the defaults and trying different plugins, I gave up and made my current server side solution. The issue I ran into is that Vite can’t replace variables in the html attributes like href in an a tag, the env replacement works fine outside the attributes.

We need an endpoint to receive the Google Authentication decision for the authentication success or failure and using the code returned, we need to get the JWT token from the token URI. This is also the first place where we must have server side processing since no one wants to start leaking client secrets. To uniquely identify users, the id token can be decoded and the sub claim can be used for the Google specific user id. The sub is guaranteed to be unique from Google Authenticated users but it is not a standard across auth providers so we can’t just use sub as a universal identifier. In my case I am storing some of the JWT results for refreshing the token. refresh_token is only returned on the initial authentication authorization. The subsequent calls for authorization only return access_token and id_token . To provide a more seamless experience, we can store the refresh_token and use that to get a new updated access_token. Theaccess_token from Google should be validated when doing the authenticated check, this is to account for login revocation. If the refresh_token fails to return for whatever reason a 401 needs to be returned and ChatGPT will start the OAuth flow once again.

Token URL

There are two cases where Token URL gets called, first is the grant_type authorization_code, where the form is posted to the endpoint with the code returned from Authorization URL. And the second is grant_type refresh_token which will send the refresh_token if the grant_type of authorization_code returns refresh_token with access_token.

TL;DR: For both of the cases the tokens get generated without a backing datastore. I chose to encrypt the user id, some padding for validation and expiration into an access_token that can be decrypted and unpacked during authorization.

I wanted to minimize the amount of calls made to the KV on Cloudflare so I chose to encrypt the access_token and refresh_token as part of the flow. This allows most of the authentication to skip fetching some sort of a value stored in the database and bake the important user id into the tokens themselves. This method was chosen purely to lower costs and increase performance on the serverless functions. There are drawbacks due to this choice, when rotating secrets all tokens will be invalidated including refresh tokens, which will result in a restart of the OAuth flow and that's not the best user experience. If implementing a datastore backed flow, KV is not the best choice, and something like D1 would be a much better storage system.

Authorization

I chose to implement authorization in Pages Functions with middleware. The middleware checks the Bearer token sent by ChatGPT and makes a decision. I use the context.data to store the extracted user id for later use.

Security

The encryption that I implemented is a very simple AES-GCM implementation using the Web Crypto API. To reduce computational complexity the number of iterations for key derivation is set to 256. This reduces the compute load on the Serverless functions and makes it cheaper to run each function. The previous choice of reducing the number fetches to an external datastore also meant I didn’t have anywhere to store what should be a randomly generated initialization vector and salt for encryption.

const generateKeys = (value: string) => ({
  password: value.slice(0, 60),
  salt: value.slice(value.length - 40, value.length),
  iv: value.slice(value.length - 60, value.length - 20),
});

generateKeys function uses a single long string and breaks it up into strings needed for encryption. This is not ideal and should not be used in highly secure applications. Between the small amounts of iterations and the static salt and initialization vectors, an attacker could derive each key in polynomial time with a dictionary attack.

Having said that, Web Crypto API is an awful interface to use as a developer. I am sure that the choices made were made for streaming and websockets but when trying to do simple encryption, the developer is forced to convert from string to ArrayBuffer and then to Uint8Array. The code is convoluted and has plenty of pitfalls to frustrate the programmer. And I do not think that this “flexibility” is warranted. The simplest way to discourage people from using good practices is to create barriers, such as this. Encryption should be easy to implement with as few gotchas as possible. The API needs to have intuitive interfaces that can themselves handle the low level implementation details. If you have to have a giant warning on the documentation page that basically says, “you will do it wrong, hire someone who knows to check your work”, YOU DID IT WRONG.

The Game

The actual game is guessing a word from the defined three endpoints for the possible words. The instructions are short and define the flow of the game and what to do when the user guesses right.

ask the user which word category they would like to guess, their category choices are animals, places, colors.
once the category is selected, call the action that corresponds to the get for that category, for animals its /api/words/animals, for places its /api/words/places, for colors it is /api/words/colors.
once you have all the words, pick a random word from the array, and remember it. 
give a hint about the word.
you are allowed to give hints about the word.
you are not allowed to say the word. 
you are not allowed to describe the word fully.
if the user guesses the word correctly then POST to the post_score endpoint, the endpoint will return the new score.
if the user asks about their score, GET from get_score and return the score

The game is to demonstrate the authorization and is not meant as a fully fledged game, so yes there is a very easy way to cheese it and get a high score. Cloudflare WAF is configured to help with some abuse prevention but there is much more that can be done to prevent abuse.