My Alexa Device Has 1.28GB of Personal Info on Me – Here’s What’s In It

Due to a number of data privacy laws that have been enacted in recent years, most Big Tech companies, such as Amazon, are legally obligated to send to you all of the personal information they have stored on you. Of course they don't go out of their way to advertise that you can request all of your data, but they're legally required to give it to you if you ask. So I requested all of the personal data Amazon had on me, and it turned out to be a whopping 1.28GB of info.

After sifting through all 1.28GB of information that Amazon had on me, I was able to gather some meaningful insight on how Amazon uses certain data on customers as it relates to reviews, advertising, and more.

How to Request the Personal Data that Amazon Keeps on You

To request all of the personal information that Amazon keeps on you, go to https://www.amazon.com/gp/privacycentral/dsar/preview.html

Amazon will send you a link within 30 days containing all of this data (for me, it took about a week for them to email me this link).

amazon personal data page
After requesting your personal data, you will be given a link to a downloads page like above.

The link will contain mostly CSV files (comma delimited files that you can open in most spreadsheet applications) but also other data such as audio files.

Keep in mind this is only for your personal customer account on Amazon. You can't request this data for your Seller Central account.

More Than 90% of the Information Stored Is from Amazon Alexa

Of the 1.28GB of information that Amazon had on me, more than 1.1 GB of this information was for Amazon Alexa. From sorting through the data, our family had initiated 21,687 contacts with Alexa, and Amazon has stored 21,687 audio files containing anywhere from 1 second to 18 seconds of audio recordings after we initiated contact.

The data that Amazon keeps from Alexa isn't that applicable to sellers, but it is interesting and a little bit worrying, so let's begin here.

Amazon has stored audio recordings of every single time we initiated contact with it (by saying “Alexa”). Note, due to the size of the folders, there were three separate folders for all of our recordings, each containing thousands of recordings.

Amazon has also transcribed each and every one of these recordings.

amazon alexa transcriptions
Amazon transcribes every single “conversation” you have with your Alexa and stores this data on you.

Here's a sample of one of these recordings (one of my daughter's first words was, of course, “Alexa”):

Many people are uncomfortable with their Amazon Alexa listening in on their every conversation. And after viewing all of the data that Amazon has stored on me from my interactions with our family's Alexa, I would say that many of the concerns people have about their privacy and their Alexa are very well-founded.

Amazon is likely using all of the data it collects from our family's interactions with Alexa to try and paint a complete profile of our family. How many people are in our family? Are there any kids? And so forth.

For our family, most of the requests from our Alexa were rather trivial and benign. “Alexa, stop” was our most common request, and perhaps our most interesting request was to play “My Little Pony” (we have a 6-year-old daughter. But incidentally, Songs of Ponyville is a great soundtrack even for adults). However, this data certainly will only become more exhaustive and interesting as the sophistication of Amazon Alexa increases.

amazon transcriptions top 20
Most of our transcriptions with Alexa consisted of music requests and using our smart home devices, but this data will likely become more exhaustive over time as the sophistication of Alexa advances.

The fact that Amazon is transcribing and storing all of our conversations with Alexa brings back many of the concerns that were brought up after the National Security Agency leaks in 2013 about the US government's national surveillance program and collecting metadata (data that provides information about other data, but not the content of the actual data) on people's phone calls.  In the case of Alexa though, it's a private company collecting this data, and Amazon is going one step further and storing the actual content of the data itself.

The good news is that Amazon is not recording and storing every conversation you have when you have Alexa plugged in. For instance, when I mention to my wife that we're running out of AAA batteries without mentioning Alexa, Amazon does not appear to be collecting and storing this data. However, one can still dream of a world in which people can opt out of Amazon transcribing and storing all of our interactions with our Alexa.

The Other Information that Amazon Stores on You and Seller Implications

The information Amazon has stored on me as a result of my interactions with Alexa is interesting, but not of much importance to sellers. And this being an Amazon seller blog, I wanted to look at some of the data Amazon stores on me (and all customers) and what the implications to sellers are from this.

Amazon had data stored on me in dozens of different categories on everything from my interactions with Alexa, my Kindle reading history, and Amazon Prime Video usage. However, the most interesting data that Amazon is storing relates back to search, reviews, and advertising. Let's take a look at some of this data.

I should point out that individual personalized data is, of course, by no means indicative of all of the data that Amazon collects. Amazon is surely collecting a small ocean's worth of data on a non-personalized level. Nevertheless, this individual personalized data does give us some insight into what types of data Amazon is collecting and allows us to speculate on what they're doing with it.

Search

Perhaps the most interesting thing of all of the 1.28GB of personal info that Amazon holds on me pertains to search. Amazon tracks 67 different metrics on every single search a customer ever performs. I dived into that data, and it revealed some interesting statistics.

Here are the 67 data points that Amazon is tracking (see the image below). For those following along at home with their own data they've requested from Amazon, you will find this in Search-Data.Product-Metrics.csv.

search data that Amazon tracks
Amazon tracks data on every single customer search on 67 different metrics.

Some of these data points are inconsequential and not very surprising, but there are some juicy tidbits from diving into these metrics. And again, just because a certain metric isn't tracked in one's personal search history (such as whether one used a promotional code or not) doesn't mean Amazon isn't tracking it separately from personal search history, such as in some meta form.

Here's a look at some of the important data points that Amazon is storing from one's search history.

Tracking of External Links

Amazon is tracking when a product link click or search initiates from an external site or device.

It's pretty well-known that Amazon loves external traffic, and there's been some pretty strong suspicion that Amazon rewards this in favorable search rankings for products that bring a lot of customers from external sources (in fact, we have an entire course on External traffic for Amazon).

Amazon is tracking what they call Search From External Site and Is From External Link (Y/N). However, it's not quite clear what Amazon considers an external link or site (for example, whether a Kindle link click is included or an Amazon Associates link). External ads such as those from Google are definitely not included though (as we'll see below).

Regardless of whatever Amazon considers external, it's clear it's an important data point for Amazon, reinforcing the belief that external traffic sources are something Amazon tracks and values.

Add to Carts & Number of Clicked Items

Another widely held belief is that Add-to-Cart (ATC) is a significant ranking factor for Amazon's search algorithm. Looking at personal search data, there's some further confirmation of this as Amazon is tracking Add-to-Carts (ATC) on two metrics: First Added Item and Number of Items Added to Cart.

For reference purposes, on my account, the average number of ATCs was 0.164 and the average number of purchases was 0.124 based on 2500 searches. I'm working to compare this to a selection of other Amazon accounts to see how this data computes over a wider number of Amazon accounts.

Amazon is also tracking the number of items each customer clicks after performing a particular search. For my Amazon account, the average number of item clicks after a search was 1.35 items. This value on its own is not that valuable, but I'm working to compile this data across a number of buyer accounts to get a better picture of how many items a customer clicks on average. I suspect my behavior (and my wife's, who's using my Prime account) is somewhat normal, and most people click only slightly more than one item after each search.

Maximum Purchase Price and Free Items

amazon search data free items highlighted
Amazon appears to only be tracking the highest ticket item purchased after a search and also is tracking paid vs. free items.

Amazon appears to only track the maximum purchase price of an item purchased after a search and whether or not an item was purchased for free.

Incidentally, Amazon does not track whether or not a promotion was used or claimed on an individual search level. Promotion use is tracked in other datasets, specifically order history, so there's a very good chance promotion use is still a search ranking factor, although it's possible that it's weighted less heavily.

Also, to get completely speculative, one could argue that Amazon doesn't care so much about whether a promotion code is used and/or for what percentage discount, but rather what the ultimate purchase price is. So, for example, if Seller 1 sells a garlic press for $9 with no discount and Seller 2 sells a garlic press for $20 with a 50% discount, Seller 2's orders will still be weighted more heavily than Seller 1's, because the ultimate purchase price is $10 compared to $9. Again, this is sheer speculation.

Given everything that has happened with rebates recently, perhaps this data gives some hope that giving a discount on Amazon isn't as bad from an SEO perspective as originally thought.

Reformulated or Abandoned Queries

amazon search query
Amazon is tracking whether a search query was reformulated (i.e., reworded) or abandoned (i.e., the customer left without purchasing).

Amazon is tracking whether a customer abandons a search query or needs to reformulate it (i.e., reword it) altogether. This probably has little importance on an individual ASIN level, but on aggregate, it gives Amazon important information about how well a search results page for a given query are performing.

First Search Domain/Is First Search From an External Ad

Amazon is also tracking the first domain that a search took place from. Normally, this just is Amazon, but there are two other primary sources: Google and, if you're using them, link redirection tools like pixelfy.me.

The fact that Amazon is clearly tracking when a search took place on a link redirection tool shouldn't be surprising (this is a common data point even in Google Analytics). But it does go to show how much footprint sellers leave whenever we use certain marketing strategies. Again, especially in light of rebates being banned, be careful that the traffic you send to Amazon is behaving in a natural way.

Google being a referrer of First Search Domain traffic is also interesting as it goes to show how much advertising Amazon does off of Amazon. In fact, in my case, Google ads accounted for roughly 1.5% of my entire search history.

How does an Amazon search initiated on Google work? Well, for example, when you search for oversized wallplate covers on Google, Amazon will have a variety of ads on that page. Some of them point to an individual ASIN page, while others go directly to an Amazon search results page with the keywords we used on Google.

The bottom ad on this Google search results page goes directly to an Amazon search results page with the exact keywords we searched for on Google.

Interestingly, Amazon does not consider these ads external links (as mentioned at the top of this section), so Amazon is clearly distinguishing between external links it pays for and those which it doesn't.

external links vs ads
Searches that result from a paid ad are not categorized as external links in Amazon's search history it keeps for customers.

Seeing the amount external advertising employs goes to show just how much money Amazon is spending on external advertising, often at no cost to sellers.

Reviews

Always of utmost concern for every Amazon seller is reviews.

We know that an Amazon customer's reviewer profile has a lot of bearing on how reviews are weighted and shown. For those following along at home, this review data is located in Retail.CustomerReviews.ReviewsVersions1.csv.

The information is basically what you would expect with Amazon tracking the review title, description, rating, and so forth. Individual Feature Ratings (e.g., fitment, comfort, value for money) are broken into separate reviews.

You'll also note that there is a moderation field with reviews either being approved or disapproved. It seems all of my reviews have been approved. One could speculate that a high rate of disapproved reviews for any individual reviewer could affect the visibility of all of their reviews.

Also interesting is how Amazon records helpful votes for each individual reviews. This file is hidden in Retail.LightWeightInteractions.csv. Amazon records how many helpful votes each review gets but also how many unhelpful votes it gets. It would seem likely that Amazon is considering what percentage of unhelpful votes an entire reviewer's profile gets, so downvoting an individual review has only so much power when in the context of that reviewer's entire review profile.

Amazon is also recording how many times you vote a review as helpful or inappropriate. There has been some speculation in the seller community that helpful votes are anonymous, but this data negates that. Also, much like Amazon considers age and reviewer profile when considering how to weight a review, Amazon likely considers age and reviewer profile when weighting helpful votes.

Advertising

Finally, let's take a look at the advertising data that Amazon is tracking.

For those looking at their own data, you can find your advertising data within the Advertising folder in our personal data. From here, there are several categories of data that Amazon is tracking as seen below.

amazon advertising categories
Amazon has just a few different categories of data it collects regarding customers and advertising.

First, Amazon is putting each customer in various buckets, presumably based on purchase history. One would imagine that Amazon uses this audience data to determine which ads to show to customers.

Amazon is always tracking which advertisers you clicked (unfortunately, in my data, I seen that I mistakenly clicked one of my company's own ads at least once – there's a buck or two I'll never get back). Interestingly, Amazon only seems to be tracking which overall Seller Account ads you clicked, not the individual ASIN. This would seem to suggest that Amazon isn't retargeting an individual ASIN after you simply clicked an ad, which is rather surprising if it's indeed true.

 

Final Thoughts

Being able to download all of the personal data that Amazon holds on us is a bit of a treasure trove of insight into the inner workings of Amazon. Seeing thousands of audio recordings that Amazon holds of myself, my wife, and my daughter is slightly unsettling even for someone like myself who isn't normally concerned about companies holding too much personal information.

However, the main concern for sellers isn't about whether we're heading for a dystopian 1984 reality. The main concern, of course, is about how we can use the gigabytes of information Amazon holds on us to sell more products. Seeing the 67 data points Amazon is tracking on every customer's own search history sheds some light into how Amazon considers external links, promotions, and how they're tracking our off-Amazon market tactics.

Dave Bryant

Dave Bryant has been importing from China for over 10 years and has started numerous product brands. He sold his multi-million dollar ecommerce business in 2016 and create another 7-figure business within 18 months. He's also a former Amazon warehouse employee of one week.

Related Articles

2 Comments

Leave a Reply

%d bloggers like this: