Skip to main content

Context Generation

Overview

Contextica includes an optimization engine that automatically generates, evaluates, and selects the best prompt (called a context) for each AI function.

Contextica essentially treats the process of creating a prompt as a search problem. Instead of a developer having to manually explore a possibly very large search space, Contextica does this programmatically.

Instead of writing prompts manually, you declare the function’s intent and provide optional examples. Contextica then explores candidate prompts, scores them against your examples, and stores the highest-scoring prompt for runtime use.


Candidate Generation

The optimization process begins by generating a batch of prompt candidates using an LLM.

  • Each candidate is a variation of the prompt tailored to the function’s description, task type, and examples.
  • Candidates are evaluated and filtered according to the optimization strategy for that task type.

This ensures that different prompts are explored before deciding on the best one.


Optimization Strategies

Contextica currently supports two main optimization strategies. The choice of strategy depends on the task type.

Beam Search (Hill Climb)

  • Iteratively generates and rewrites candidates.
  • Keeps the top-scoring candidates at each step (the “beam”).
  • Continues until no better candidates are found or the iteration limit is reached.
  • Works well for extraction, classification, and structured output tasks.

LLM-as-a-Judge

  • Generates several candidate prompts.
  • Uses an LLM to evaluate and grade each candidate against provided examples.
  • Selects the best-performing candidate based on aggregated scores.
  • Works well for generative and summarization tasks where qualitative evaluation is required.

Strategy Defaults

By default, Contextica selects an optimization strategy based on the task type.

Task typeOptimization Strategy
GENERATELLM-as-a-Judge
SUMMARIZELLM-as-a-Judge
EXTRACTBeam Search (Hill Climb)
CLASSIFYBeam Search (Hill Climb)
TRANSFORMBeam Search (if examples provided), otherwise default template
TRANSLATEDefault template (no optimization)
CHATDefault template (no optimization)

Evaluation and Scoring

After candidate prompts are generated, Contextica evaluates their performance using strategies that depend on the task type and return type.

Task typeReturn typeEvaluation Strategy
EXTRACTStringSPAN_F1
AnyStringEmbedding Similarity
AnyIntegerExact Match
AnyLongExact Match
AnyFloatExact Match
AnyShortExact Match
AnyEnumExact Match
  • Each candidate is tested against all examples provided in the configuration.
  • Scores are averaged across examples to produce a final score.
  • The candidate with the highest score is stored as the winning context for that function.

Running Context Generation

To run the context generation process:

  1. Define your @AIService and @AIFunction.
  2. Implement a configuration class with @AIFunctionConfiguration.
  3. Build your project:
mvn clean install

Then start the generate-contexts process using maven:

mvn contextica:generate-contexts

The duration depends on the number of functions and examples. Optimization can take several minutes for complex cases.

Recommendation: Optimize one AIFunction at a time for faster runs and easier troubleshooting.

Locking Functions

Once you are satisfied with the generated context for a function, mark it as locked.

Locked functions are skipped during future optimization runs.

This prevents overwriting a prompt that already performs well.

It reduces runtime costs and avoids regression.

Example:

@AIFunctionConfiguration
public FunctionConfiguration generateProductDescription() {
return FunctionConfiguration.builder()
.description("Generate a product description from specifications")
.taskType(TaskType.GENERATE)
.locked(true)
.examples(...) // Optional optimization examples
.build();
}

Locking functions is a best practice once you have validated the quality of the generated context.