This demo demonstrates how to use Service Fabric and Event Hubs to build a highly scalable, highly reliable ingestion pipeline for a click analytics system that receives events sent by a client-side script running in a web page and persists them to an Append Blob, one for each user session.
MIT License
Click analytics is a special type of web analytics that gives special attention to user interactions with a web page: clicks (Point-and-click), mouse tracking, data entering, all the use actions that constitute the first stage in the conversion funnel. Commonly, click analytics focuses on on-site analytics. An editor of a web site uses click analytics to determine the performance of his or her particular site, with regards to where the users of the site are clicking. Also, click analytics may happen real-time or "unreal"-time, depending on the type of information sought. Typically, frontpage editors on high-traffic news media sites will want to monitor their pages in real-time, to optimize the content or understand user navigation patterns. Editors, designers or other types of stakeholders may analyze user actions on a wider time frame to aid them assess performance of writers, design elements or advertisements etc. Data about clicks may be gathered in at least two ways. Ideally, a click is "logged" when it occurs, and this method requires some functionality that picks up relevant information when the event occurs. Usually, a client-side script running in the web page collects and sends events to an ingestion pipeline that tracks and stores all the actions that happen during a user session. Once the user visit is complete, the data ingestion system sends a message to an hot path analytics system to process the session data. Alternatively, user sessions can be periodically analyzed by a cold path analytics system. The solution adopts the Claim Check design pattern to send the user session data to the downstream analytics system: instead of sending the user session data to the analytics system, it sends a message which contains the URI of the blob where user session data has been recorded.
This demo demonstrates how to build a highly scalable, highly reliable ingestion pipeline for a click analytics system that receives events sent by a client-side script running in a web page and persists them to an Append Blob, one for each user session. The demo uses a lightweight gateway service based on OWIN that receives events via the HTTP protocol by client-side scripts and sends them to an Event Hub. The use of an Event Hub allows to decouple the receive part of the ingestion pipeline from the persist operation (separation of concerns) and scale the two systems in a separate way. Another microservice is responsible for reading events from the Event Hub and persist them to an Append Blob, one for each user session. Since the processing of a user session starts only when the client-side script sends a special event to mark the end of the user visit and since page views inside a user sessions are processed in a sequential way, Append Blobs are a good fit for storing user actions. Append Blobs are similar to block blobs in that they are made up of blocks, but they are optimized for append operations, so they are useful for logging scenarios. A single block blob or append blob can contain up to 50,000 blocks of up to 4 MB each, for a total size of slightly more than 195 GB (4 MB X 50,000). For more information on Append Blobs, see Understanding Block Blobs, Append Blobs, and Page Blobs.
The following picture shows the architecture design of the application.
The Service Fabric application ingest events from the input Event Hub, processes sensor readings and generates an alert whenever a value outside of the tolerance range is received. The application is composed of three services:
Note: one of the advantages of stateless services over stateful services is that by specifying InstanceCount="-1" in the ApplicationManifest.xml, you can create an instance of the service on each node of the Service Fabric cluster. When the cluster uses Virtual Machine Scale Sets to to scale up and down the number of cluster nodes, this allows to automatically scale up and scale down the number of instances of a stateless service based on the autoscaling rules and traffic conditions.
In order to maximize the throughput of the demo, put in practice the following recommendations:
Make sure to replace the following placeholders in the project files below before deploying and testing the application on the local development Service Fabric cluster or before deploying the application to your Service Fabric cluster on Microsoft Azure.
This list contains the placeholders that need to be replaced before deploying and running the application:
App.config file in the UserEmulator project:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<appSettings>
<add key="url" value="http://localhost:8085/usersessions;
http://[NAME].[REGION].cloudapp.azure.com:8085/usersessions"/>
<add key="userCount" value="20"/>
<add key="eventInterval" value="2000"/>
<add key="eventsPerUserSession" value="50"/>
</appSettings>
<startup>
<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5"/>
</startup>
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral"/>
<bindingRedirect oldVersion="0.0.0.0-8.0.0.0" newVersion="8.0.0.0"/>
</dependentAssembly>
</assemblyBinding>
</runtime>
</configuration>
ApplicationParameters\Local.xml file in the PageViewTracer project:
<?xml version="1.0" encoding="utf-8"?>
<Application xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Name="fabric:/PageViewTracer"
xmlns="http://schemas.microsoft.com/2011/01/fabric">
<Parameters>
<Parameter Name="EventProcessorHostService_InstanceCount"
Value="1" />
<Parameter Name="EventProcessorHostService_StorageAccountConnectionString"
Value="[StorageAccountConnectionString]" />
<Parameter Name="EventProcessorHostService_ServiceBusConnectionString"
Value="[ServiceBusConnectionString]" />
<Parameter Name="EventProcessorHostService_EventHubName"
Value="[EventHubName]" />
<Parameter Name="EventProcessorHostService_ConsumerGroupName"
Value="[ConsumerGroupName]" />
<Parameter Name="EventProcessorHostService_ContainerName"
Value="[ContainerName]" />
<Parameter Name="EventProcessorHostService_QueueName"
Value="[QueueName]" />
<Parameter Name="EventProcessorHostService_MaxRetryCount"
Value="3" />
<Parameter Name="EventProcessorHostService_CheckpointCount"
Value="[CheckpointCount]" />
<Parameter Name="EventProcessorHostService_BackoffDelay"
Value="1" />
<Parameter Name="PageViewWebService_InstanceCount"
Value="1" />
<Parameter Name="PageViewWebService_ServiceBusConnectionString"
Value="[ServiceBusConnectionString]" />
<Parameter Name="PageViewWebService_EventHubName"
Value="[EventHubName]" />
<Parameter Name="PageViewWebService_EventHubClientNumber"
Value="[EventHubClientNumber]" />
<Parameter Name="PageViewWebService_MaxRetryCount"
Value="3" />
<Parameter Name="PageViewWebService_BackoffDelay"
Value="1" />
</Parameters>
</Application>
ApplicationParameters\Cloud.xml file in the PageViewTracer project:
<?xml version="1.0" encoding="utf-8"?>
<Application xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Name="fabric:/PageViewTracer"
xmlns="http://schemas.microsoft.com/2011/01/fabric">
<Parameters>
<Parameter Name="EventProcessorHostService_InstanceCount"
Value="-1" />
<Parameter Name="EventProcessorHostService_StorageAccountConnectionString"
Value="[StorageAccountConnectionString]" />
<Parameter Name="EventProcessorHostService_ServiceBusConnectionString"
Value="[ServiceBusConnectionString]" />
<Parameter Name="EventProcessorHostService_EventHubName"
Value="[EventHubName]" />
<Parameter Name="EventProcessorHostService_ConsumerGroupName"
Value="[ConsumerGroupName]" />
<Parameter Name="EventProcessorHostService_ContainerName"
Value="[ContainerName]" />
<Parameter Name="EventProcessorHostService_QueueName"
Value="[QueueName]" />
<Parameter Name="EventProcessorHostService_MaxRetryCount"
Value="3" />
<Parameter Name="EventProcessorHostService_CheckpointCount"
Value="[CheckpointCount]" />
<Parameter Name="EventProcessorHostService_BackoffDelay"
Value="1" />
<Parameter Name="PageViewWebService_InstanceCount"
Value="-1" />
<Parameter Name="PageViewWebService_ServiceBusConnectionString"
Value="[ServiceBusConnectionString]" />
<Parameter Name="PageViewWebService_EventHubName"
Value="[EventHubName]" />
<Parameter Name="PageViewWebService_EventHubClientNumber"
Value="[EventHubClientNumber]" />
<Parameter Name="PageViewWebService_MaxRetryCount"
Value="3" />
<Parameter Name="PageViewWebService_BackoffDelay"
Value="1" />
</Parameters>
</Application>
ApplicationManifest.xml file in the PageViewTracer project:
<?xml version="1.0" encoding="utf-8"?>
<ApplicationManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
ApplicationTypeName="PageViewTracerType"
ApplicationTypeVersion="1.0.1"
xmlns="http://schemas.microsoft.com/2011/01/fabric">
<Parameters>
<Parameter Name="EventProcessorHostService_InstanceCount" DefaultValue="-1" />
<Parameter Name="EventProcessorHostService_StorageAccountConnectionString" DefaultValue="" />
<Parameter Name="EventProcessorHostService_ServiceBusConnectionString" DefaultValue="" />
<Parameter Name="EventProcessorHostService_EventHubName" DefaultValue="" />
<Parameter Name="EventProcessorHostService_ConsumerGroupName" DefaultValue="" />
<Parameter Name="EventProcessorHostService_ContainerName" DefaultValue="usersessions" />
<Parameter Name="EventProcessorHostService_QueueName" DefaultValue="usersessions" />
<Parameter Name="EventProcessorHostService_MaxRetryCount" DefaultValue="3" />
<Parameter Name="EventProcessorHostService_CheckpointCount" DefaultValue="100" />
<Parameter Name="EventProcessorHostService_BackoffDelay" DefaultValue="1" />
<Parameter Name="PageViewWebService_InstanceCount" DefaultValue="-1" />
<Parameter Name="PageViewWebService_ServiceBusConnectionString" DefaultValue="" />
<Parameter Name="PageViewWebService_EventHubName" DefaultValue="" />
<Parameter Name="PageViewWebService_EventHubClientNumber" DefaultValue="32" />
<Parameter Name="PageViewWebService_MaxRetryCount" DefaultValue="3" />
<Parameter Name="PageViewWebService_BackoffDelay" DefaultValue="1" />
</Parameters>
<ServiceManifestImport>
<ServiceManifestRef ServiceManifestName="EventProcessorHostServicePkg"
ServiceManifestVersion="1.0.0" />
<ConfigOverrides>
<ConfigOverride Name="Config">
<Settings>
<Section Name="EventProcessorHostConfig">
<Parameter Name="StorageAccountConnectionString"
Value="[EventProcessorHostService_StorageAccountConnectionString]" />
<Parameter Name="ServiceBusConnectionString"
Value="[EventProcessorHostService_ServiceBusConnectionString]" />
<Parameter Name="EventHubName"
Value="[EventProcessorHostService_EventHubName]" />
<Parameter Name="ConsumerGroupName"
Value="[EventProcessorHostService_ConsumerGroupName]" />
<Parameter Name="ContainerName"
Value="[EventProcessorHostService_ContainerName]" />
<Parameter Name="QueueName"
Value="[EventProcessorHostService_QueueName]" />
<Parameter Name="CheckpointCount"
Value="[EventProcessorHostService_CheckpointCount]" />
<Parameter Name="MaxRetryCount"
Value="[EventProcessorHostService_MaxRetryCount]" />
<Parameter Name="BackoffDelay"
Value="[EventProcessorHostService_BackoffDelay]" />
</Section>
</Settings>
</ConfigOverride>
</ConfigOverrides>
</ServiceManifestImport>
<ServiceManifestImport>
<ServiceManifestRef ServiceManifestName="PageViewWebServicePkg"
ServiceManifestVersion="1.0.1" />
<ConfigOverrides>
<ConfigOverride Name="Config">
<Settings>
<Section Name="PageViewWebServiceConfig">
<Parameter Name="ServiceBusConnectionString"
Value="[PageViewWebService_ServiceBusConnectionString]" />
<Parameter Name="EventHubName"
Value="[PageViewWebService_EventHubName]" />
<Parameter Name="EventHubClientNumber"
Value="[PageViewWebService_EventHubClientNumber]" />
<Parameter Name="MaxRetryCount"
Value="[PageViewWebService_MaxRetryCount]" />
<Parameter Name="BackoffDelay"
Value="[PageViewWebService_BackoffDelay]" />
</Section>
</Settings>
</ConfigOverride>
</ConfigOverrides>
</ServiceManifestImport>
<DefaultServices>
<Service Name="EventProcessorHostService">
<StatelessService ServiceTypeName="EventProcessorHostServiceType"
InstanceCount="[EventProcessorHostService_InstanceCount]">
<SingletonPartition />
</StatelessService>
</Service>
<Service Name="PageViewWebService">
<StatelessService ServiceTypeName="PageViewWebServiceType"
InstanceCount="[PageViewWebService_InstanceCount]">
<SingletonPartition />
</StatelessService>
</Service>
</DefaultServices>
</ApplicationManifest>