Essential

Prompt Engineering: Beginner to Advanced

Prompt engineering means writing better instructions for AI models. Clear prompts usually give better results. This is one of the most important beginner skills in AI.

Best for

Readers who want a practical, role-based learning guide with clear progression from fundamentals to advanced implementation.

Not ideal for

Visitors looking for a short definition page without examples, sections, or a guided learning path.

What Is Prompt Engineering?

Prompt engineering is the skill of writing input instructions in a clear and useful way.

The better the prompt, the better the AI output is likely to be.

A weak prompt often gives a vague answer. A strong prompt gives a focused answer.

Simple Prompt Structure

A strong prompt usually includes role, instruction, context, and expected output format.

For example, instead of saying 'Tell me about Python', you can say 'Explain Python in simple language for beginners and give 3 examples.'

Role
  +
Instruction
  +
Context
  +
Output Format

Python Example

This example shows a better structured prompt.

prompt = """
You are a teacher.

Explain Artificial Intelligence in simple language.
Give 3 real world examples.
Use short paragraphs.
"""

response = llm.predict(prompt)

print(response)

Intermediate: Few-Shot Prompting

Few-shot prompting means giving the model 2-5 examples of what you want before the actual question. This dramatically improves consistency and output format.

The model learns from your examples and applies the same pattern to the new input. This works better than long instructions for formatting tasks.

Use few-shot prompting when you need the model to follow a very specific output structure.

from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.chat_models import ChatOpenAI

# Define example question → answer pairs
examples = [
    {"question": "What is photosynthesis?", "answer": "Photosynthesis is the process by which plants use sunlight to make food from water and carbon dioxide."},
    {"question": "What is gravity?", "answer": "Gravity is the force that pulls objects toward each other. On Earth, it pulls us toward the ground."},
    {"question": "What is democracy?", "answer": "Democracy is a system of government where citizens vote to choose their leaders and make decisions."},
]

# Template for each example
example_template = PromptTemplate(
    input_variables=["question", "answer"],
    template="Question: {question}\nAnswer: {answer}"
)

# Build the few-shot prompt template
few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_template,
    prefix="Answer each question in one clear sentence, in simple language.",
    suffix="Question: {input}\nAnswer:",
    input_variables=["input"]
)

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
chain = few_shot_prompt | llm

# Test with a new question
result = chain.invoke({"input": "What is artificial intelligence?"})
print(result.content)

Intermediate: Chain-of-Thought Prompting

Chain-of-Thought (CoT) prompting tells the model to show its reasoning step by step before giving a final answer. This significantly improves accuracy on complex problems.

Simply adding 'Think step by step before answering' to your prompt is the simplest form of CoT. More advanced versions provide example reasoning chains.

CoT is especially powerful for math, logic, multi-step analysis, and decision-making tasks.

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Without CoT — direct answer (often less accurate for complex questions)
basic_prompt = "A student studied 2 hours on Monday, 3 hours on Tuesday, and 1.5 hours on Wednesday. If their goal is 10 hours per week, how many more hours do they need to study?"

# With CoT — ask model to reason step by step
cot_prompt = """A student studied 2 hours on Monday, 3 hours on Tuesday, and 1.5 hours on Wednesday.
Their goal is 10 hours per week.
How many more hours do they need to study?

Think through this step by step:
1. First, calculate the total hours studied so far.
2. Then subtract from the weekly goal.
3. State the final answer clearly."""

# Compare responses
print("Without CoT:")
print(llm([HumanMessage(content=basic_prompt)]).content)

print("\nWith Chain-of-Thought:")
print(llm([HumanMessage(content=cot_prompt)]).content)

Intermediate: Controlling Output Format

For applications that need structured data, you must instruct the model to return output in a specific format like JSON, numbered lists, or a table.

The more explicit your format instructions, the more reliable the output. Always include an example of the expected format.

Combine format instructions with a parser to automatically convert the model's text output into Python objects.

from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import CommaSeparatedListOutputParser, StructuredOutputParser, ResponseSchema

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Example 1: Get a comma-separated list
list_parser = CommaSeparatedListOutputParser()
list_prompt = ChatPromptTemplate.from_template(
    "List 5 careers in data science. {format_instructions}"
)

list_chain = list_prompt | llm | list_parser
careers = list_chain.invoke({
    "format_instructions": list_parser.get_format_instructions()
})
print("Career list:", careers)

# Example 2: Get structured JSON with multiple fields
response_schemas = [
    ResponseSchema(name="topic", description="The topic name"),
    ResponseSchema(name="difficulty", description="beginner, intermediate, or advanced"),
    ResponseSchema(name="resources", description="comma-separated list of 3 learning resources"),
]

structured_parser = StructuredOutputParser.from_response_schemas(response_schemas)
structured_prompt = ChatPromptTemplate.from_template(
    "Give me study information about: {subject}\n{format_instructions}"
)

chain = structured_prompt | llm | structured_parser
info = chain.invoke({
    "subject": "machine learning",
    "format_instructions": structured_parser.get_format_instructions()
})

print("\nStructured Output:")
print(f"Topic: {info['topic']}")
print(f"Difficulty: {info['difficulty']}")
print(f"Resources: {info['resources']}")

Intermediate: Role-Based and Persona Prompting

Assigning a role to the model changes how it responds. A 'teacher explaining to a 10-year-old' gives very different output from 'a university professor'.

Role prompting is also used to constrain behavior: 'You are a support agent who only discusses product features. Do not answer questions about competitors.'

Combine role, constraints, tone, and audience in the system message for the most consistent results.

from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.3)

# The same question asked with 3 different personas
question = "Explain how computers work."

personas = [
    {
        "name": "Simple Teacher",
        "system": "You are a friendly teacher explaining things to a curious 10-year-old. Use simple words, short sentences, and fun comparisons."
    },
    {
        "name": "Technical Expert",
        "system": "You are a computer science professor. Use precise technical terminology and give a detailed technical explanation."
    },
    {
        "name": "Career Counselor",
        "system": "You are a career counselor helping students decide if computer science is right for them. Focus on practical implications and career relevance."
    }
]

for persona in personas:
    messages = [
        SystemMessage(content=persona["system"]),
        HumanMessage(content=question)
    ]
    response = llm(messages).content
    print(f"--- {persona['name']} ---")
    print(response[:200] + "...")
    print()

Advanced: Building a Prompt Evaluation Suite

A prompt evaluation suite is a set of test cases that you run against your prompts to measure quality before deploying changes.

Define test cases with: input, expected keywords or themes, and optionally a reference answer. Score each response against criteria.

Run evaluations on every prompt change. Track scores over time in a log file. Never deploy a prompt that reduces your baseline score.

import json
from datetime import datetime
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

SYSTEM_PROMPT = """You are a helpful study assistant. Answer clearly and accurately."""

# Define test suite
TEST_CASES = [
    {
        "id": "TC001",
        "input": "What is the speed of light?",
        "must_contain": ["299", "million", "m/s", "kilometer"],
        "must_not_contain": ["slow", "unknown"],
        "min_words": 10,
        "max_words": 80
    },
    {
        "id": "TC002",
        "input": "Name three programming languages.",
        "must_contain_any": ["python", "java", "javascript", "c++", "ruby", "go", "rust"],
        "must_not_contain": ["math", "painting"],
        "min_count": 3  # Must mention at least 3 items
    },
    {
        "id": "TC003",
        "input": "Is Python or Java better?",
        "must_contain": ["depends", "use case"],  # Should give nuanced answer
        "must_not_contain": ["always", "definitely better"],
        "min_words": 20
    }
]

def evaluate_response(response: str, test_case: dict) -> dict:
    response_lower = response.lower()
    issues = []

    if "must_contain" in test_case:
        for kw in test_case["must_contain"]:
            if kw.lower() not in response_lower:
                issues.append(f"Missing keyword: {kw}")

    if "must_contain_any" in test_case:
        found = any(kw.lower() in response_lower for kw in test_case["must_contain_any"])
        if not found:
            issues.append("None of the expected keywords found")

    if "must_not_contain" in test_case:
        for kw in test_case["must_not_contain"]:
            if kw.lower() in response_lower:
                issues.append(f"Should not contain: {kw}")

    word_count = len(response.split())
    if "min_words" in test_case and word_count < test_case["min_words"]:
        issues.append(f"Too short: {word_count} words (min {test_case['min_words']})")
    if "max_words" in test_case and word_count > test_case["max_words"]:
        issues.append(f"Too long: {word_count} words (max {test_case['max_words']})")

    passed = len(issues) == 0
    return {"passed": passed, "issues": issues, "word_count": word_count}

# Run all test cases
results = []
for tc in TEST_CASES:
    messages = [
        SystemMessage(content=SYSTEM_PROMPT),
        HumanMessage(content=tc["input"])
    ]
    response = llm(messages).content
    evaluation = evaluate_response(response, tc)
    
    status = "PASS" if evaluation["passed"] else "FAIL"
    results.append({"id": tc["id"], "status": status, "issues": evaluation["issues"]})
    print(f"{tc['id']}: {status} | Issues: {evaluation['issues'] or 'None'}")

# Summary
passed = sum(1 for r in results if r["status"] == "PASS")
print(f"\nTotal: {passed}/{len(results)} passed — {round(passed/len(results)*100)}% pass rate")

# Save results log
log = {"timestamp": datetime.now().isoformat(), "results": results}
with open("eval_log.json", "a") as f:
    f.write(json.dumps(log) + "\n")

Advanced: Reducing Hallucinations with Grounding Prompts

Hallucination is when the model confidently states false information. It is one of the biggest reliability risks in production AI applications.

Grounding techniques: provide factual context in the prompt, instruct the model to cite sources, ask it to say 'I don't know' instead of guessing, and use RAG to anchor responses to real documents.

Always test your prompts with trick questions and edge cases to see how the model handles uncertainty before deploying.

from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Anti-hallucination system prompt — key strategies built in
ANTI_HALLUCINATION_PROMPT = """
You are a factual study assistant. You follow these rules strictly:

1. ONLY state facts you are confident about
2. If you are unsure, say "I'm not certain, but..." and clearly mark it as uncertain
3. If you don't know something, say "I don't have reliable information on that."
4. NEVER invent statistics, dates, names, or specific numbers
5. If asked for a specific number or fact you're unsure of, say "Please verify this with an authoritative source."
6. Keep responses grounded — only answer what is asked
"""

def safe_answer(question: str) -> str:
    messages = [
        SystemMessage(content=ANTI_HALLUCINATION_PROMPT),
        HumanMessage(content=question)
    ]
    return llm(messages).content

# Test with ambiguous and uncertain questions
test_questions = [
    "What is the exact population of Mars?",         # Should admit it's 0 / uncertain
    "Who won the Nobel Prize in Physics in 2019?",    # Model may or may not know
    "What is 2 + 2?",                                 # Confident, factual
    "What will the stock market do tomorrow?",        # Should refuse to predict
]

for q in test_questions:
    print(f"Q: {q}")
    print(f"A: {safe_answer(q)[:150]}")
    print()

Project Milestones by Level

Beginner Project: Build a prompt template library for 3 use cases (study helper, explainer, quiz maker). Test each with 5 different inputs and compare output quality.

Intermediate Project: Create a few-shot prompt system with 5 examples for a specific domain (e.g., explaining science topics). Add CoT for complex questions and structured output parsing for consistent formatting.

Advanced Project: Build a complete prompt evaluation suite with 20+ test cases. Track pass rates across prompt versions. Write a before/after comparison report showing how prompt improvements changed scores.

Frequently Asked Questions

Why is prompt engineering important?

Because a better prompt usually gives a better and more useful AI response.

Do beginners need prompt engineering?

Yes. It is one of the first important skills in practical AI work.

How do I know if a prompt is production-ready?

A production-ready prompt gives consistent output quality across diverse test cases, handles edge cases, and has measurable evaluation criteria.