Google Analytics 4 : Data and Structure
EPA content related to Google Analytics is changing.
Google's legacy platform, Universal Analytics (UA), will reach end of life in mid-2023 with a one-time extension for contracting clients such as EPA until July 1, 2024. See KB article.
In these Web Analytics pages, content for Universal Analytics is marked "Google Universal Analytics (legacy)."
Content for the new platform, Google Analytics 4, is marked "Google Analytics 4 (GA4)."
Google Analytics (GA) is a Web analytics tool. It collects Web traffic metrics using Google Tag Manager (GTM). Google Tag Manager is in the WebCMS template. Anyone building an application should use the look and feel template to grab the GTM code. The code sends Users and Event data to Google servers for processing. Every time a user visits a webpage, the embedded code will collect information about how that user interacted with the page. This process of using embedded code to collect Web traffic metrics is called page tagging. GA is one of many Web analytics tools that use page tagging.
- How Does it Work?
- Cookies and Privacy
- Page Tagging v. Log File Analysis
- How is Content Organized in Google Analytics?
How Does it Work?
GA works by setting various cookies in visitors’ Web browsers. Cookies are small files that Web servers place in Web browsers, often for the purpose of tracking internet activity. The file consists of a text message that is sent back to the server each time the browser requests a Web page. Session cookies only remain in a Web browser until the browser is closed or remains inactive for a specified amount of time. Persistent cookies, on the other hand, remain after a browser session ends. Some persistent cookies, including those set by GA, are set to expire after a specific amount of time passes (i.e. six months or two years).
Cookies and Privacy
Neither EPA nor Google collects any personally identifiable information (PII) about Users of the EPA website.
GA uses first-party persistent cookies, which the government classifies as Tier 2 persistent cookies. Tier 2 persistent cookies do not collect any PII and are permissible for use by federal agencies. For more information on how the Office of Management and Budget (OMB) defines persistent cookies, see OMB M-10-22, Guidance for Online Use of Web Measurement and Customization Technologies (9 pp, 102 K, About PDF).
Unless you first optout by blocking the cookies, the GTM will automatically set a persistent cookie in the browser of the computer or mobile device you are using to access the EPA website. Visitors can choose not to allow GA to track their Web activity by changing their browser settings. Modern browsers have options to block the kinds of cookies set by GA.
Cookie Deletion and Return Users
Page Tagging v. Log File Analysis
An alternative approach to collecting Web traffic metrics through page tagging is log file analysis. This method entails downloading server log files for processing in an analytics software program. It does not use page tagging or cookies.
Since server logs record all server transactions, including activity from Web crawlers and bots, software is needed to filter out non-human activity. While log file software does filter out known crawlers and bots and those that self-identify, not all crawlers and bots self-identify, making it difficult to filter all non-human activity.
On the other hand, GA page tags have to be activated by GTM, which the vast majority of Web spiders and bots do not process. While this may represent a small amount of traffic, it should be considered as part of any Web traffic analysis.
Log files may not be collecting all human activity either, since consecutive Sessions to the same Web page can cause the page to be retrieved from the browser’s cache. Web servers do not typically record such transactions.
Where page tagging holds a major advantage over log file analysis, however, is in the breadth of traffic metrics that can be collected and the ad hoc customizations that are available.
Page tagging solutions use cookies to track Return Sessions and other Session-based metrics, such as Pages per Session and Session Duration. Log file software relies on IP addresses to calculate Session-based metrics, which can be problematic since many large companies have dynamic IP addresses that can change after or even during a Session. Even though some Users delete their cookies prior to returning the same website, page tagging is viewed as a more accurate calculation of Session-based metrics.
Page tagging tools also provide user-friendly segmentation and custom reporting options. This allows you to quickly calculate the number of Users, Sessions, or Events from segments, such as:
- Mobile devices
- Geographical locations (down to the City level)
- Social media referrals
- Searches that included particular keywords
These calculations can be executed quickly in the interface. In contrast, customizations to log file reports may require reprocessing the raw log files, or even custom configurations to the software itself. In most cases, however, these customizations are not possible with log file analysis.
The main advantage of log file analysis is the internal control of data. Whereas page tagging usually requires third-party hosting and processing of data, log file analysis enables organizations to process metrics without relying on outside parties. Depending on the organization, this can be a major selling point.
For analysis purposes, it is most important to find the tool that meets your needs, understanding that all analytics tools will provide differing calculations, and stick with that tool as you compare metrics month over month and year over year.
How is Content Organized in Google Analytics?
TSSMS areas have no practical application in EPA’s Google Analytics (GA) Account. GA follows an organizational hierarchy that consists of
- An Account
- Properties
- Data Streams
GA accounts have the ability to track multiple Properties, which represent unique entities, such as websites or applications. It is important to think of each property as representing a distinct entity, in total, because the metrics of different properties are inherently separate. For example, one account might include three different properties for an agency’s
- Public websites
- Intranet websites
- Staging environment
Each property has its own set of Data Streams, these are the sources of information that feed your Google Analytics property. Data streams allow you to compare and consolidate user behavior across different platforms. Data streams are part of the new GA4 structure. In Universal Analytics, there is a unique property for each source of data. Views and filters were used to adjust reports and configure your data collection to your needs.
GA4 no longer uses views. Now there is a single analytics property that contains data streams. You can apply filters directly to a report to adjust how you track and view your data from each stream. For web data streams, GA4 has set up tracking for a few key metrics within each stream's enhanced measurements. The web data stream will automatically track page views, scrolls, file downloads, video plays, site search, and outbound link clicks.