Apache Pulsar
Cloud-native distributed messaging and streaming platform
💫 What is Apache Pulsar?
Apache Pulsar is a cloud-native, distributed messaging and streaming platform designed for modern applications. Originally developed by Yahoo (now Verizon Media), Pulsar combines the best features of traditional messaging systems with modern cloud-native architecture patterns.
Pulsar's unique architecture separates serving and storage layers, enabling independent scaling and providing features like multi-tenancy, geo-replication, and tiered storage out of the box. This design makes it particularly well-suited for large-scale, mission-critical applications in cloud environments.
With built-in support for both messaging and streaming workloads, Pulsar offers a unified platform that can handle everything from simple pub-sub messaging to complex stream processing scenarios, making it an attractive choice for organizations looking to consolidate their messaging infrastructure.
⭐ Key Features
Multi-Tenancy
Native multi-tenant architecture with namespace isolation, authentication, and authorization.
Geo-Replication
Built-in cross-datacenter replication for disaster recovery and global data distribution.
Tiered Storage
Automatic data offloading to cheaper storage (S3, GCS) for cost-effective long-term retention.
Schema Evolution
Built-in schema registry with support for Avro, JSON, and Protobuf schema evolution.
Functions
Lightweight compute framework for stream processing without external dependencies.
Unified Messaging
Single platform for streaming, queuing, and pub-sub messaging patterns.
🏗️ Pulsar Architecture
Pulsar's architecture is built around the separation of concerns, with distinct layers for serving, storage, and coordination. This design enables independent scaling and provides operational flexibility.
Brokers
Stateless serving layer that handles client connections and message routing.
BookKeeper
Distributed log storage system that provides durability and replication.
ZooKeeper
Coordination service for metadata management and cluster coordination.
Key Concepts
Hierarchy:
- • Tenant: Top-level namespace for multi-tenancy
- • Namespace: Administrative unit within tenant
- • Topic: Message feed within namespace
- • Subscription: Consumer group equivalent
Example Structure:
company/app1/user-events
company/app2/order-stream
📬 Messaging Models
📡 Publish-Subscribe (Streaming)
Each subscription maintains its own cursor, allowing multiple consumers to independently process the same message stream. Perfect for event streaming and real-time analytics.
Characteristics:
- • Independent cursors per subscription
- • Message replay capability
- • Multiple consumers per subscription
- • Fan-out message delivery
Use Cases:
- • Event streaming
- • Real-time analytics
- • Audit logging
- • Notification systems
📋 Message Queuing
Messages are distributed among consumers in a round-robin fashion, ensuring load balancing and work distribution across multiple worker instances.
Characteristics:
- • Round-robin message distribution
- • Load balancing across consumers
- • Message acknowledgments
- • Automatic failover
Use Cases:
- • Task processing
- • Job queues
- • Background processing
- • Work distribution
⚙️ Pulsar Functions
Pulsar Functions is a lightweight compute framework that enables stream processing directly within the Pulsar cluster. Functions can be written in Java, Python, or Go, and are automatically managed by the Pulsar runtime.
Function Features
- • Lightweight: No external frameworks required
- • Multi-language: Java, Python, Go support
- • Auto-scaling: Scales based on message backlog
- • State Management: Built-in state store
Function Types
- • Transform: Message transformation and enrichment
- • Filter: Conditional message processing
- • Aggregate: Windowed aggregations
- • Route: Content-based message routing
Function Example
# Transform the message
result = input.upper()
return result
--py transform.py \
--inputs input-topic \
--output output-topic
🔌 Connectors & Ecosystem
Pulsar provides a rich ecosystem of connectors for integrating with external systems, along with compatibility with Kafka APIs and protocols.
Source Connectors
Import data from external systems
- • Debezium (CDC)
- • Kafka Connect
- • File sources
- • Database sources
Sink Connectors
Export data to external systems
- • Elasticsearch
- • HDFS/S3
- • JDBC databases
- • Cloud storage
Protocol Support
Multiple protocol compatibility
- • Native Pulsar protocol
- • Kafka API compatibility
- • WebSocket
- • HTTP REST
🎯 Common Use Cases
Enterprise Applications
- • Financial Services: High-throughput trading and risk management systems
- • E-commerce: Order processing and inventory management
- • Gaming: Real-time player event processing
- • IoT Platforms: Sensor data collection and analysis
Cloud-Native Patterns
- • Multi-Region Deployments: Global data replication and disaster recovery
- • Serverless Computing: Event-driven function execution
- • Microservices: Service-to-service communication
- • Edge Computing: Distributed data processing at the edge
📊 Performance & Characteristics
Performance Metrics
Deployment Models
🔗 Related Technologies
Pulsar fits into the broader messaging and streaming ecosystem alongside these technologies:
Alternative Platforms
- • Apache Kafka - Distributed streaming platform
- • RabbitMQ - Traditional AMQP broker
- • Redis - In-memory pub/sub messaging
Complementary Tools
- • Apache BookKeeper - Distributed log storage
- • Apache Flink - Stream processing engine
- • Presto/Trino - SQL query engine