Create Alarms for Known Issues from Metrics
Monitoring for Application Outcomes Ryn Brandish wants to understand when there are application issue and is concerned about security. He would like metrics for errors and warnings. ExampleCorp has limitations on the image size that it can successfully process. While this issue does not happen frequently it does not make customers happy. Being aware of the error rate for submitted images will allow the Business and Development team to determine if increase the image size should be a priority. Currently most warnings are related to security issues. Ryn would like visibility on how often they happen.
5.1 Create a log filter metric based on the defined threshold for the error
- Click on Log Groups in the breadcrumb trail at the top of the screen
- Select the application.log group by clicking on the radio button next to it, and then in the Actions list choose Create Metric Filter.
- Enter “ActiveStorage::InvariableError” (include quotes) for Filter Pattern. This is a known error that we will cause later in the lab.
- Click on Next
- For Filter name, type ActiveStorage-InvariableError
- On Metric namespace list, select ApplicationLogMetrics
- Enter ImageError as the Metric Name
- Enter 1 as the Metric Value
- Click on Next
- Review the filter, and then click Create Metric Filter
5.2 Create an alarm for the error when the metric threshold is crossed
- Select the ActiveStorage-InvariableError by clicking on the radio button next to it, and then choose Create alarm.
- Change Period to 10 seconds
- Under Conditions
- Choose Static for threshold type
- For Alarm condition, Choose Greater/Equal than 1
- Under Additional Configuration, leave 1 out of 1 for datapoints to alarm
- Select Treat missing data as good (not breaching threshold) for Treat missing data as
- Under Notification
- Select In alarm for Alarm state trigger
- Choose Select an existing SNS topic for Select an SNS topic
- In the send a notification to…, choose the SNS topic that you created
- Click Next
- ImageErrorAlarm for Name
- Click Next
- Review the alarm, and click Create Alarm
NOTE: You should see that the alarm has insufficient data with a 1 next to INSUFFICIENT in the navigation menu. This will change to OK within a few seconds.
5.3 Generate error logs through user activity
- Navigate to the ExampleCorp using the URL that you made a note of earlier (CloudFormation Output).
- Enter email@example.com for email
- Enter Password123 for Password
- Click Login
- Click Upload Image Click Select Image
- Find break_app.jpg in your filesystem
- Click Upload
NOTE: The application will start exhibiting issues, and the application will eventually fail to load.
5.4 Review logs to identify error
- Navigate to the CloudWatch Console
- Click Logs in the navigation menu
- Click on application.log log group - this is from /opt/ExampleCorp/log/application.log as configured earlier in the CloudWatch agent configuration.
- Click on the Search Log Group button
- Enter ERROR (case sensitive) and hit enter.
- You will see errors ending ActiveStorage::InvariableError. This is an error that is generated when a user uploads a non-image file. This is a known issue, but the Dev team has not had cycles to adress it.
Build a Monitoring Plan: Alerting and Response
Monitoring for Operational Outcomes Sansa Bailish is focused on outages, reliability, and getting better sleep. There is a known issue with image trends; when the instances are rebooted the application doesn’t restart. Too frequently Sansa has received a late night call to get online and restart the application. The user experience lead is very frustrated with the downtime associated to these incidents. They need to be detected sooner and resolved faster.
Sansa is looking for a monitoring solution to detect the incident (the reboot event), and a way to trigger an automated recovery.