What are AI Evals?

There are two things in one:

  1. It is a way to show users of AI extension how to use it and what it can do. Users will see suggested prompts when they @-mention your extension:

    CleanShot 2024-12-17 at 12.46.33@2x.png

  2. It is a way for developers to test that the AI Extension works reliably. It is like integration tests, but for AI: it allows to iterate on AI extension implementation and prompts and ensure that previously tested scenarios aren't broken:

    CleanShot 2024-12-17 at 13.16.49@2x.png

Structure

AI eval consists of 3 parts:

Add evals to ai.evals array in package.json and run ray evals from your extension directory to run and see the results. Note: you have to be authenticated as a member of Raycast AI Extensions Beta organization

Example:

{
  "input": "@todo-list Mark the posting the announcement as completed",
  "mocks": {
    "get-todos": {
      "todos": [
        {
          "id": "aef13ef3-9c37-463e-9c93-3573325c0231",
          "text": "Post the announcement"
        }
      ]
    },
    "toggle-todo": {
      "success": "true"
    }
  },
  "expected": [
    {
      "callsTool": "get-todos"
    },
    {
      "callsTool": {
        "name": "toggle-todo",
        "arguments": {
          "id": "aef13ef3-9c37-463e-9c93-3573325c0231"
        }
      }
    }, 
    {
      "meetsCriteria": "Tells that item was successfully marked as completed"
    }
  ]
}

You can find more examples in ai-extensions-beta repository.

Expectations