What is Small Data?

image02

There is still not enough information about Small Data, while the Big Data has apparently been talked over for a long time. Let’s find out what is what!

Small Data — Definition

Small data is a type of data based on analysis of single person’s behavior (respectively, Big Data is based on big audience’s behavioral patterns analysis).

To collect and process Small Data often one person’s work is enough, unlike with the Big Data. In eCommerce this type of data is used in everyday tasks. Determining customer’s gender, physiological parameters, social status and other personal information, you become able to segment your customers according to obtained criteria and run marketing campaigns more efficiently.

image01

Small Data is all about everyday tasks: for instance, you use it when you aggregate a database of email addresses and mail out your digests.

Small Data and Big Data — What Do Retailers Need?

Small Data has a better-known counterpart  — the Big Data, recent years technological trend. Retailers certainly need Big Data as it helps them see the overall picture of the market, latest trends, predict competitive factor and demand level for certain product, drive up sales better understanding needs of various target groups based on their behavior, etc. There are lots of ways to apply Big Data. Recommendation systems are a great example of its use: they use Big Data and complex algorithmization to predict needs and interest of thousands of customers comparing similar behavioral patterns within the target group. When Big Data analysis is finished, the system is able to suggest products most suitable for this specific customer.

Amazon recommends products based on Big Data analysis

image00

It is important to note that this case above, of Big Data application, is given purely as an example. There are really few companies in online retail that are able to process and use Big Data so easily and with such high level of automation. In fact, if you just collect all this data and give it to retailers, most of them won’t be able to process and use it in their marketing campaigns or consider it planning to widen their product range. The reason is ordinary – working this data out is  complicated and time-consuming. In addition, writing algorithms for Big Data analysis is expensive.

It’s much easier with the Small Data. Here are three main reasons why:

  •  Small Data is accessible. To collect it you don’t need to use scientific methods, have a database engine, develop complex hypotheses and so on. Small Data is all about familiar things. It needs analyzing, as well as the Big Data, but this can be done with just the help of standard business software.
  •   Small Data is precise. You are always capable of renewing or specifying your client data yourself – an email or phone number verification on your  website or a round of calls from your call center.
  •  Small Data is functional. Large datasets require high level of analysis, a lot of time and special software. Besides, there are always risks of arriving to wrong conclusions or overanalyzing. Small Data is better processed  manually and easier to make strategic decisions on.

Jules Bermans defines in his book “Principles of Big Data” (pp 21-22, “Introduction”) key differences between Small and Big Data. Let’s list them in order to fully understand the matter.

Small Data Big Data
Goals Usually designed to answer a specific question or serve a particular goal. Usually designed with a goal in  mind, but the goal is flexible and the questions posed are protean. There really is no way to completely specify what the Big Data resource will contain and how the various types of data held in the resource will be organized, connected to other data resources, or usefully analyzed.
Location Typically, small data is contained within one institution, often on one computer, sometimes in one file. Typically spread throughout electronic space, typically parceled onto multiple Internet servers, located anywhere on earth.
Data structure and content Ordinarily contains highly structured data. The data domain is restricted to a single discipline or subdiscipline. The data often comes in the form of uniform records in an ordered spreadsheet. Must be capable of absorbing unstructured data (e.g., such as free-text documents, images, motion pictures, sound recordings, physical objects). The subject matter of the resource may cross multiple disciplines, and the individual data objects in the resource may link to data contained in other, seemingly unrelated, Big Data resources.
Data preparation In many cases, the data user prepares her own data, for her own purposes. The data comes from many diverse sources, and it is prepared by many people. People who use the data are seldom the people who have prepared the data.
Longevity Typically, the data is measured using one experimental protocol, and the data can be represented using one set of standard units (see Glossary item, Protocol). Big Data projects typically contain data that must be stored in perpetuity. Ideally, data stored in a  Big Data resource will be absorbed  into another resource when the original resource terminates.
Measurements Typically, the data is measured using one experimental protocol, and the data can be represented using one set of standard units (see Glossary item, Protocol). Many different types of data are delivered in many different electronic formats. Measurements, when present, may be obtained by many different protocols. Verifying the quality of Big Data is one of the most difficult tasks for data managers.
Reproducibility Projects are typically repeatable. If there is some question about the quality of the data, reproducibility of the data, or validity of the conclusions drawn from the data, the entire project can be repeated, yielding a new data set. Replication of a Big Data project is seldom feasible. In most instances, all that anyone can hope for is that bad data in a Big Data resource will be found and flagged as such.
Stakes Project costs are limited. Laboratories and institutions can usually recover from the  occasional small data failure. Big Data projects can be obscenely expensive. A failed Big Data effort can lead to bankruptcy, institutional collapse, mass firings, and the sudden disintegration of all the data held in the resource.
Introspection Individual data points are identified by their row and column location within a spreadsheet or database table (see Glossary item, Data point). If you know the row and column headers, you can find and specify all of the data points contained within. Unless the Big Data resource is exceptionally well designed, the contents and organization of the resource can be inscrutable, even to the data managers.
Analysis In most instances, all of the data contained in the data project can be analyzed together, and all at once. With few exceptions, such as those conducted on supercomputers or in parallel on multiple computers, Big Data is ordinarily analyzed in incremental steps (step-by-step). The data are extracted, reviewed, reduced, normalized, transformed, visualized, interpreted, and reanalyzed with different methods.

We hope differences between these two types are now obvious. Since our product has to deal with Small Data (along with Big Data) everyday, let’s clarify the role it  plays in REES46’s work.


Small Data and the recommendation system

When a new customer comes to store website, he or she is an unknown variable for the recommendation system. There is no available information yet, so, neither collaborative filtration, nor other Big Data processing methods  will work, since we don’t know this customer’s purchase history, preferences, etc.


This is why REES46 is bound to use Small Data, as well as Big Data, relying on it in the analysis and later transforming it into high quality recommendations.


Example  1.

Customer has checked a few items in the category “Baby and Children Clothes”. The system concludes that this customer has children, and, based on the parameters of clothes recently viewed by this customer, calculates child’s gender and age. Instantly, current recommendations get adapted to the changed needs. Now most relevant children’s clothes are shown in recommendations, with the parameters obtained from Small Data – this way Small Data makes up for Big Data shortcomings.


Example  2.

Customer put three packs of “Pro Plan” (dog food) into the shopping cart. Based on the amount of the product, our system determines dog’s size (quite big). Based on the brand, it determines what price segment to recommend products from (quality product, high price). Now, according to this data, the system will show recommendation cards with high-priced accessories and toys, thus increasing probability of additional purchase.

In most situations Small Data is what you need when you don’t have time and resources to process Big Data, or if there is yet no information about the client.

Small Data is the best for tangible results here and now.

Leave a Reply

Your email address will not be published. Required fields are marked *