Problem Statement
Design an in-memory Log Analyzer and Parser engine (similar to indexing engines in Elasticsearch or Splunk). The engine must ingest raw log strings, parse them into structured records containing timestamps, levels, and messages, build an Inverted Index mapping keywords to log records for constant-time search, and evaluate complex boolean search queries (like finding logs containing the word "database" AND having the level ERROR).
Design Decisions & Patterns Used
Scanning gigabytes of raw logs sequentially (using regex or brute-force string searches) is too slow for production. To achieve sub-millisecond search times, we construct an **Inverted Index**. This data structure tokenizes log messages into words and maps each word to the set of log records containing it. We can then resolve complex search queries by performing set intersections and unions.
We will utilize the following Design Patterns:
- Composite Pattern: Building nested boolean query expression trees (e.g., combining keyword queries and log level queries using composite `AndQuery` and `OrQuery` nodes).
- Strategy Pattern: Defining dynamic filtering strategies to execute keyword checks or log level comparisons.
Functional Requirements
- Parse raw string logs (e.g.,
"2026-06-07 10:00:00 [ERROR] DB connection failed") into structured entities. - Build an Inverted Index mapping keywords (tokenized from log messages) to matching log records.
- Build a secondary index mapping log levels (
INFO,WARN,ERROR) to their matching records. - Support complex queries using logical operations (
AND,OR).
Objects Required
LogLevel(Enum mapping INFO, WARN, and ERROR log levels)LogRecord(Value object storing structured log details)LogIndex(Inverted index registry managing tokens and collections)LogQuery(Composite query interface evaluating execution checks)
LogLevel Enum & LogRecord Class
The LogLevel enum represents log levels, and the LogRecord class stores parsed log metadata.
public enum LogLevel {
INFO,
WARN,
ERROR
}
Let's define the LogRecord class:
public class LogRecord {
private final String timestamp;
private final LogLevel level;
private final String message;
private final String rawLog;
public LogRecord(String timestamp, LogLevel level, String message, String rawLog) {
this.timestamp = timestamp;
this.level = level;
this.message = message;
this.rawLog = rawLog;
}
public String getTimestamp() { return timestamp; }
public LogLevel getLevel() { return level; }
public String getMessage() { return message; }
public String getRawLog() { return rawLog; }
@Override
public String toString() {
return rawLog;
}
}
The constructor sets initial properties, and the overridden toString() method returns the raw log line.
LogIndex Class (Inverted Index)
The LogIndex class manages the inverted index, tokenizing log messages into keywords and mapping them to their matching records.
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
public class LogIndex {
private final Map<String, Set<LogRecord>> invertedIndex;
private final Map<LogLevel, Set<LogRecord>> levelIndex;
public LogIndex() {
this.invertedIndex = new HashMap<>();
this.levelIndex = new HashMap<>();
for (LogLevel level : LogLevel.values()) {
levelIndex.put(level, new HashSet<>());
}
}
public void addRecord(LogRecord record) {
// Index by Log Level
levelIndex.get(record.getLevel()).add(record);
// Index by message tokens (keywords)
String[] tokens = tokenize(record.getMessage());
for (String token : tokens) {
invertedIndex.computeIfAbsent(token, k -> new HashSet<>()).add(record);
}
}
public Set<LogRecord> getRecordsByKeyword(String keyword) {
return invertedIndex.getOrDefault(keyword.toLowerCase(), Collections.emptySet());
}
public Set<LogRecord> getRecordsByLevel(LogLevel level) {
return levelIndex.getOrDefault(level, Collections.emptySet());
}
private String[] tokenize(String text) {
// Lowercase, strip punctuation, and split by space characters
return text.toLowerCase()
.replaceAll("[^a-zA-Z0-9\\s]", "")
.split("\\s+");
}
}
Here is an explanation of the core operations in the LogIndex class:
- The constructor initializes maps for levels and token indices.
addRecord()registers log records under their log level in the level index, tokenizes the message text, and associates the record with each token in the inverted index.tokenize()normalizes text by stripping punctuation and lowercasing strings before splitting them into individual words.
LogQuery Interface & Implementations (Composite Pattern)
We apply the **Composite Design Pattern**. The LogQuery interface defines the query contract, and concrete queries (leaves and composites) execute search filters against the index.
import java.util.Set;
public interface LogQuery {
Set<LogRecord> execute(LogIndex index);
}
Let's implement the leaf nodes: KeywordQuery and LevelQuery.
import java.util.Set;
public class KeywordQuery implements LogQuery {
private final String keyword;
public KeywordQuery(String keyword) {
this.keyword = keyword;
}
@Override
public Set<LogRecord> execute(LogIndex index) {
return index.getRecordsByKeyword(keyword);
}
}
The KeywordQuery returns the set of records mapped to a specific keyword.
import java.util.Set;
public class LevelQuery implements LogQuery {
private final LogLevel level;
public LevelQuery(LogLevel level) {
this.level = level;
}
@Override
public Set<LogRecord> execute(LogIndex index) {
return index.getRecordsByLevel(level);
}
}
The LevelQuery returns the set of records mapped to a specific log level.
Let's implement the composite nodes: AndQuery and OrQuery.
import java.util.HashSet;
import java.util.Set;
public class AndQuery implements LogQuery {
private final LogQuery left;
private final LogQuery right;
public AndQuery(LogQuery left, LogQuery right) {
this.left = left;
this.right = right;
}
@Override
public Set<LogRecord> execute(LogIndex index) {
Set<LogRecord> leftResult = new HashSet<>(left.execute(index));
Set<LogRecord> rightResult = right.execute(index);
// Retain only elements present in both sets (set intersection)
leftResult.retainAll(rightResult);
return leftResult;
}
}
The AndQuery executes both child queries and returns the intersection of their result sets.
import java.util.HashSet;
import java.util.Set;
public class OrQuery implements LogQuery {
private final LogQuery left;
private final LogQuery right;
public OrQuery(LogQuery left, LogQuery right) {
this.left = left;
this.right = right;
}
@Override
public Set<LogRecord> execute(LogIndex index) {
Set<LogRecord> leftResult = new HashSet<>(left.execute(index));
Set<LogRecord> rightResult = right.execute(index);
// Add all elements from both sets (set union)
leftResult.addAll(rightResult);
return leftResult;
}
}
The OrQuery executes both child queries and returns the union of their result sets.
LogParser Class
The LogParser class parses raw log strings into structured LogRecord instances using regular expressions.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class LogParser {
// Regex mapping pattern: yyyy-MM-dd HH:mm:ss [LEVEL] message text
private static final Pattern LOG_PATTERN =
Pattern.compile("^(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}) \\[(\\w+)\\] (.+)$");
public static LogRecord parse(String rawLog) {
Matcher matcher = LOG_PATTERN.matcher(rawLog);
if (!matcher.matches()) {
throw new IllegalArgumentException("Invalid log format: " + rawLog);
}
String timestamp = matcher.group(1);
LogLevel level = LogLevel.valueOf(matcher.group(2).toUpperCase());
String message = matcher.group(3);
return new LogRecord(timestamp, level, message, rawLog);
}
}
The parse() method matches raw log lines against our regex pattern, extracts the timestamp, level, and message, and instantiates a new LogRecord.
Main Driver Class
This class tests our log analyzer engine. It ingests raw logs, builds the index, compiles queries, and displays search results.
import java.util.Set;
public class Main {
public static void main(String[] args) {
LogIndex index = new LogIndex();
// Ingest raw logs
String[] logs = {
"2026-06-07 10:00:00 [INFO] User Alice logged in successfully",
"2026-06-07 10:01:05 [ERROR] Database connection pool exhausted",
"2026-06-07 10:02:10 [WARN] Memory usage exceeded 85 percent threshold",
"2026-06-07 10:03:15 [ERROR] Failed to establish database socket link",
"2026-06-07 10:04:20 [INFO] User Bob logged in successfully"
};
System.out.println("Ingesting and parsing raw logs...");
for (String rawLog : logs) {
LogRecord record = LogParser.parse(rawLog);
index.addRecord(record);
}
System.out.println("Logs indexed successfully.");
System.out.println("\n==========================================");
System.out.println("Scenario 1: Querying Logs containing 'database' AND level 'ERROR'");
System.out.println("==========================================");
// Build query: keyword 'database' AND level 'ERROR'
LogQuery query1 = new AndQuery(
new KeywordQuery("database"),
new LevelQuery(LogLevel.ERROR)
);
Set<LogRecord> results1 = query1.execute(index);
for (LogRecord record : results1) {
System.out.println(record);
}
System.out.println("\n==========================================");
System.out.println("Scenario 2: Querying Logs containing 'Alice' OR 'Bob'");
System.out.println("==========================================");
// Build query: keyword 'Alice' OR keyword 'Bob'
LogQuery query2 = new OrQuery(
new KeywordQuery("Alice"),
new KeywordQuery("Bob")
);
Set<LogRecord> results2 = query2.execute(index);
for (LogRecord record : results2) {
System.out.println(record);
}
}
}
The main() driver parses logs into structured records, populates the inverted index, builds composite query trees, and displays the matching log entries.
Comments
Post a Comment