There are two things in one:
It is a way to show users of AI extension how to use it and what it can do. Users will see suggested prompts when they @
-mention your extension:
It is a way for developers to test that the AI Extension works reliably. It is like integration tests, but for AI: it allows to iterate on AI extension implementation and prompts and ensure that previously tested scenarios aren't broken:
AI eval consists of 3 parts:
input
is a text prompt that you expect from users of your AI Extension. It should include @
mention the name of your extension (name
from package.json
)mocks
– mocked results of tool calls. It is required to give AI the context, i.e. if you write an eval for @todo-list What are my todos?
you need to provide the actual list in get-todos
mock.expected
– array of expectations, similar to expect
statements in unit / integration tests (you’ll find the list of all supported expectations below)Add evals to ai.evals
array in package.json
and run ray evals
from your extension directory to run and see the results. Note: you have to be authenticated as a member of Raycast AI Extensions Beta organization
{
"input": "@todo-list Mark the posting the announcement as completed",
"mocks": {
"get-todos": {
"todos": [
{
"id": "aef13ef3-9c37-463e-9c93-3573325c0231",
"text": "Post the announcement"
}
]
},
"toggle-todo": {
"success": "true"
}
},
"expected": [
{
"callsTool": "get-todos"
},
{
"callsTool": {
"name": "toggle-todo",
"arguments": {
"id": "aef13ef3-9c37-463e-9c93-3573325c0231"
}
}
},
{
"meetsCriteria": "Tells that item was successfully marked as completed"
}
]
}
You can find more examples in ai-extensions-beta repository.
includes
to check that AI response includes some substring (case-insensitive). Example {"includes": "added"}
matches
to check that AI response matches some regexp. Example (to check that response contains markdown link): "matches": "\\\\[([^\\\\]]+)\\\\]\\\\(([^\\\\s\\\\)]+)(?:\\\\s+\\"([^\\"]+)\\")?\\\\)"
meetsCriteria
to check that AI response meets some plain-text criteria (validated using AI). Useful when AI varies the response and it is hard to match it using includes
or matches
. Example: "meetsCriteria": "Tells that label with this name doesn't exist"
callsTool
to check that during the request AI called some AI tool included into your AI extension. There are two forms:
Short form to check if AI tool with specific name was called. Example: { "callsTool": "get-todos" }
Long form to check tool arguments: callsTool: { name: "name", arguments: { arg1: matcher, arg2: matcher}}
. Matches could be complex and combine any supported rules:
eq
(used by default for any value that is not object or array)includes
matches
and
(used by default if array is used)or
not
Example:
{
"callsTool": {
"name": "create-comment",
"arguments": {
"issueId": "ISS-1",
"body": {
"includes": "waiting for design"
}
}
}
}