Context Generation

Overview

Contextica includes an optimization engine that automatically generates, evaluates, and selects the best prompt (called a context) for each AI function.

Contextica essentially treats the process of creating a prompt as a search problem. Instead of a developer having to manually explore a possibly very large search space, Contextica does this programmatically.

Instead of writing prompts manually, you declare the function’s intent and provide optional examples. Contextica then explores candidate prompts, scores them against your examples, and stores the highest-scoring prompt for runtime use.

Candidate Generation

The optimization process begins by generating a batch of prompt candidates using an LLM.

Each candidate is a variation of the prompt tailored to the function’s description, task type, and examples.
Candidates are evaluated and filtered according to the optimization strategy for that task type.

This ensures that different prompts are explored before deciding on the best one.

Optimization Strategies

Contextica currently supports two main optimization strategies. The choice of strategy depends on the task type.

Beam Search (Hill Climb)

Iteratively generates and rewrites candidates.
Keeps the top-scoring candidates at each step (the “beam”).
Continues until no better candidates are found or the iteration limit is reached.
Works well for extraction, classification, and structured output tasks.

LLM-as-a-Judge

Generates several candidate prompts.
Uses an LLM to evaluate and grade each candidate against provided examples.
Selects the best-performing candidate based on aggregated scores.
Works well for generative and summarization tasks where qualitative evaluation is required.

Strategy Defaults

By default, Contextica selects an optimization strategy based on the task type.

Task type	Optimization Strategy
GENERATE	LLM-as-a-Judge
SUMMARIZE	LLM-as-a-Judge
EXTRACT	Beam Search (Hill Climb)
CLASSIFY	Beam Search (Hill Climb)
TRANSFORM	Beam Search (if examples provided), otherwise default template
TRANSLATE	Default template (no optimization)
CHAT	Default template (no optimization)

Evaluation and Scoring

After candidate prompts are generated, Contextica evaluates their performance using strategies that depend on the task type and return type.

Task type	Return type	Evaluation Strategy
EXTRACT	String	SPAN_F1
Any	String	Embedding Similarity
Any	Integer	Exact Match
Any	Long	Exact Match
Any	Float	Exact Match
Any	Short	Exact Match
Any	Enum	Exact Match

Each candidate is tested against all examples provided in the configuration.
Scores are averaged across examples to produce a final score.
The candidate with the highest score is stored as the winning context for that function.

Running Context Generation

To run the context generation process:

Define your @AIService and @AIFunction.
Implement a configuration class with @AIFunctionConfiguration.
Build your project:

mvn clean install

Then start the generate-contexts process using maven:

mvn contextica:generate-contexts

The duration depends on the number of functions and examples. Optimization can take several minutes for complex cases.

Recommendation: Optimize one AIFunction at a time for faster runs and easier troubleshooting.

Locking Functions

Once you are satisfied with the generated context for a function, mark it as locked.

Locked functions are skipped during future optimization runs.

This prevents overwriting a prompt that already performs well.

It reduces runtime costs and avoids regression.

Example:

@AIFunctionConfiguration
public FunctionConfiguration generateProductDescription() {
    return FunctionConfiguration.builder()
        .description("Generate a product description from specifications")
        .taskType(TaskType.GENERATE)
        .locked(true)
        .examples(...) // Optional optimization examples
        .build();
}

Locking functions is a best practice once you have validated the quality of the generated context.

Overview​

Candidate Generation​

Optimization Strategies​

Beam Search (Hill Climb)​

LLM-as-a-Judge​

Strategy Defaults​

Evaluation and Scoring​

Running Context Generation​

Locking Functions​