B-Project

Ace your homework & exams now with Quizwiz!

What is a negative binomial regression?

A negative binomial regression is a statistical method of analysis that estimates the relationship between one or more independent variables and a dependent variable, which is a discrete, count variable. It also does not make the assumption that values are normally distributed.

What are cognitive / consumer biases?

Basically, our brains often use mental shortcuts — called 'heuristics' — during problem-solving to simplify decision- making processes. However, sometimes these mental shortcuts can lead us astray, causing us to make decisions that are inaccurate, illogical, or 'irrational.' These 'deviations' are known as cognitive biases.

What are the downfalls of using consumer biases?

Consumer biases are based on personal experience, so they are a lot more difficult to predict, even with the right data. That's why we did not solely base our hypotheses off consumer biases.

Why was the retail_price variable included as a control variable? What are the control variables? Why use one?

Control variables in statistics are defined constants whose value does not change throughout the analyses. It is held constant so that we can test the relationship between the dependent and independent variables. A control variable is the one element that is not changed throughout the analyses. Its unchanging state allows the relationship between the other variables being tested to be better understood. In regression analyses, as your independent variable changes, you can see the corresponding change in the dependent variable. A control variable is another factor in a regression analysis. The variable retail_price was included as a control variable because the research focus was not primarily to investigate its relation to the dependent variables but in order to still account for potential effects. We don't want to ignore that price plays a role, because our research doesn't revolve around it but we think that it's an important factor so we don't want to ignore it.

Problems with stepwise regressions?

Could be too narrow, it wouldn't be a real life representation, because in real life all the independent factors are there at the same time.

How did you pick the four brands that you focused on? Why did you pick them?

First of all, we wanted to pick four brands, two of which were sustainable and two which were not. And then we wanted so also to have one premium and one non-premium sustainable brand and one premium and one non-premium for non-sustainable brands. So in all four brands, two of which are sustainable, and two not, and two of which are premium, and two not. We obviously have an interest in sustainable shopping, and therefore we had a background in which brands are sustainable and which are considered fast fashion. Sustainable fashion in this context generally refers to either making the clothes ethically, aka no sweatshops, or making the clothes in a way that does not harm or has a low impact on the environment, such as using deadstock fabrics. Fast fashion brands are the opposite, so they value producing a large quantity of clothing, quickly, and staying on trend over ethical and environmental factors. So first, we did our research and chose some brands that we thought were clearly sustainable, and some brands that were clearly not. Next, Thredup made it very easy for the premium aspect, because either they categorized the brand as premium or not. So looking at the brands we prepicked, we separated them into premium and not premium brands. The last factor that we looked into was sample size. We decided to go with brands that had a similar amount of items on the platform because if one brand was receiving 100s of items a day while another was only receiving a few items a day, this would have skewed our results. Due to the fact that it was just the two of us and we had to manually code the items and the added items, we went with brands that had a smaller number of items on the platform, so that the data collection was manageable. That left us with four brands: reformation, alternative apparel, target, and Ganni.

How did you pick the consumer biases for each hypothesis?

First, we did some initial research on what motivates consumers to buy secondhand, and we found motivations such as bargain-hunting, uniqueness, fashionability, and sustainability. We also found a bit of literature about the popularity of secondhand luxury goods, and this along with the other motivations is where we got initial the logic behind our 5 hypotheses. We could have just let this research support our hypotheses, but we wanted to add another layer that would connect this initial research to buying behavior on the platform, and that is where consumer biases and phenomena came in. We researched tons and tons of biases and found five that supported our hypotheses, which I think really made our research interesting. For example, if we would have just talked about how secondhand consumers love bargain-hunting and finding a deal and then connected it to hypothesis 1, which is that items with a larger price gap between initial retail price and reselling price will see more buyer behavior, would have sufficed, but by adding the anchoring effect, which talks about how people are influenced by the initial information that they receive, we were able to connect this to how ThredUp displays the initial retail price alongside the reselling price, adding even more support to our hypothesis.

Can you give a brief summary of how you analyzed the data?

First, we used a correlation matrix to check before any regression to see if there's something that could impact our results, and this is where we saw a strong correlation between retail price, resale price, and price gap and decided to drop the resales price variable. Secondly, we created frequency tables and histograms to get an overview of how our data was dispersed. Here, we found that our data for the increase in favs and shopping cart activity was rightly-skewed and overdispersed, which is why we chose to do negative binomial regression for the first two stages. We chose a logistic regression for the final stage since sold is a binary variable. Next, for each stage, we conducted a stepwise regression, with a series of narrow regressions, where we isolated independent variables, adding and removing one after each stage, to test how they influenced each dependent variable. And then finally, we conducted a consolidated regression for each stage, which included all relevant independent variables.

Why are the estimated coefficients of the logistic regression presented in odds ratios?

For ease of interpretation, all estimated coefficients of the logistic regression are presented in the form of odds ratios. Odds ratio is only an option for the log regression, and it is considered more easily interpreted than coefficients.

Would you consider this a positivist or interpretivist study?

Furthermore, I believe that positivism is about how certain knowledge is based on natural phenomena and their properties and relations and that information collected and interpreted through reason and logic is the only source of all certain knowledge. Also, it is my understanding that positivist studies usually use scientific quantitative methods, while interpretivist studies use humanistic qualitative methods. Therefore, I believe our study is a positivist study.

Do you think that the results would be different if you had conducted research over a longer period of time?

I believe that the independent variables that were only significant in regards to the first level of commitment, liking, could have also become significant on other levels given more time. This is mainly due to the fact of the popularity funnel, and the more likes an item gets the more likely for buyer behavior on all three levels to occur. For example, the large price gap is significant for only the liking stage, but given more time, this popularity funnel could have elevated it to be significant on the shopping cart stage or even the sold stage.

What are the pros and cons of the quantitative method you used? What did it not show?

I think if we had to choose just one method, a quantitative method is the best method for this study, because, since we conducted deductive research, our goal was to test hypotheses and therefore is an approach more suited to working with quantitative data, by using the data and statistics to test and confirm these hypotheses. If our aim was to get a deeper understanding of secondhand shopping motivations rather than to test our hypotheses, than a quantitative approach would have been better suited, but this is not the case. That being said, a few quantitative interviews could have supplemented and further validated our quantitative results nicely, if we would have had more time and resources to do so.

How can you apply your research to the business world? For new goods online stores?

I think our results not only deal with online secondhand shopping platforms specifically but can be generalized to any online platform selling goods. We found the risk-compensation and the bandwagon effect to be the strongest consumer biases to influence shopping behavior, so businesses can figure out ways to add features and functionalities on their website to exploit these biases. For example, a rating system plays into both biases, if the ratings are good of course. Good ratings would reduce the perceived risk of buying the product, and a high number of ratings would indicate that it is a popular item. Also using tags, which is not just a feature distinct to Thredup or secondhand platforms. Websites can use tags like "bestseller" and back in stock to indicate popularity, while tags such as 100% cashmere or highest rated would indicate a low-risk buy.

How do you think the brands you chose affected your results?

I think the brands we chose definitely affected our results. Since the four brands we chose had fewer items compared to other brands, we could assume that it means that fewer people are sending them in, because fewer people are buying these brands, meaning they are less popular. Picking brands that were more popular and thus had a larger amount of items on the platform, we would probably have seen more buying activity. The problem with this though is that there are not very many actual sustainable clothing brands out there, and those that are out there are not very big or well known, whereas fast fashion brands are larger and extremely well known for the most part. Therefore, we had to pick less popular fast fashion brands, such as Ganni and target.

Would it had been better to create sub hypotheses for Hypothesis 2, instead of putting all the tags and uniqueness together?

If Hypothesis 2, was separated into sub-hypotheses, such as one for each tag, one for either tag, and one for uniqueness, the new with tags tag would be partially confirmed because of its significance in regards to items sold. I think separating them could have been a good idea because some of the tags implicate more than just uniqueness. For example, the new with tags could be associated with risk compensation and hypothesis 5, and the price drop tag, if that was relevant to the study, could be associated with hypothesis 1 and bargain hunting. All in all, though, these tags separate them apart from the rest, so I think they all belong in hypothesis 2.

Explain Table 10.

In Table 10, we can see the coefficients, which help us determines whether a change in an independent variable makes the event (dependent variable) more likely or less likely. Generally, positive coefficients make the event more likely, and negative coefficients make the event less likely. An estimated coefficient near 0 implies that the effect of the predictor is small. We can also see the p-values represented by asterisks, which convey the significance level of each test. If a value is less than the predefined alpha level, 0.01*, the variable is said to have a statistically high significance. If a value is less than the predefined alpha level, 0.05**, the variable is said to be statistically significant. If a value is less than the predefined alpha level, 0.01***, the variable is said to have a statistically low significance. In regards to the variable "Increase favorites", for the narrow regressions, New with tags has a statistically high significance and price gap, favs on day 1, sustainability, and premium all have statistically low significance. In the consolidated model, price gap has statistically high significance, favs on the first day have statistical significance and premium has low statistical significance. It also includes the pseudo-r-squared which helps indicate model fit. A rule of thumb that I found to be quite helpful is that a pseudo R2 ranging from 0.2 to 0.4 indicates a very good model fit. As such, the model isn't a particularly strong model either. The reason that the R-squared is so low is because we don't have enough data, so if we would have gotten more data, the R-squared would have been higher. I don't think that this model doesn't fit our data.

Explain Table 11.

In Table 11, we can see the coefficients, which help us determines whether a change in an independent variable makes the event (dependent variable) more likely or less likely. Generally, positive coefficients make the event more likely, and negative coefficients make the event less likely. An estimated coefficient near 0 implies that the effect of the predictor is small. We can also see the p-values represented by asterisks, which convey the significance level of each test. If a value is less than the predefined alpha level, 0.01*, the variable is said to have a statistically high significance. If a value is less than the predefined alpha level, 0.05**, the variable is said to be statistically significant. If a value is less than the predefined alpha level, 0.01***, the variable is said to have a statistically low significance. In regards to the variable "shopping cart" activity, for the narrow regressions, uniqueness has a statistically high significance and price gap, favs on day 1, sustainability, and premium all have statistically low significance. In the consolidated model, favs on the first day have statistical significance and premium and sustainability have low statistical significance. It also includes the pseudo-r-squared which helps indicate model fit. A rule of thumb that I found to be quite helpful is that a pseudo R2 ranging from 0.2 to 0.4 indicates a very good model fit. As such, the model isn't a particularly strong model either for the narrow regressions but it is a good fit for the consolidated method.

Explain Table 12.

In Table 12, we can see that all estimated coefficients are presented in the form of odds ratios. The odds ratio compares the odds of two events. The odds of an event are the probability that the event occurs divided by the probability that the event does not occur. We used odds ratios to understand the effect of the independent variables. For continuous predictors, odds ratios that are greater than 1 indicate that the even is more likely to occur as the independent variable increases. Odds ratios that are less than 1 indicate that the event is less likely to occur as the independent variable increases. For categorical predictors, the odds ratio compares the odds of the event occurring at 2 different levels of the predictor, level A and B. Odds ratios that are greater than 1 indicate that the event is more likely at level A. Odds ratios that are less than 1 indicate that the event is less likely at level A. In regards to the variable "sold", we can see that new with tags, premium, and favs on day 1 have an odds ratio over 1 indicating that an item being bought is more likely to occur in comparison to these two factors. The shopping cart variable has an odds ratio of less than 1, indicating that the item being sold is less likely to occur as more people add it to their shopping cart, interestingly. We can also see the p-values represented by asterisks, which convey the significance level of each test. If a value is less than the predefined alpha level, 0.01*, the variable is said to have a statistically high significance. If a value is less than the predefined alpha level, 0.05**, the variable is said to be statistically significant. If a value is less than the predefined alpha level, 0.01***, the variable is said to have a statistically low significance. In regards to the variable "sold", for the narrow regressions, new with tags, and shopping cart has a statistically high significance and favs on day 1 and premium have statistical significance. In the consolidated model, new with tags has statistically high significance, favs on day 1 and shopping cart activity have statistical significance, and the premium variable has low statistical significance. It also includes the AUC which helps indicate model fit. In general, an AUC of 0.5 suggests no discrimination, 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding. For the narrow regressions, the model either suggests no discrimination or is considered acceptable, and for the consolidated regression, the model is considered excellent.

How do your conclusions relate to online shops, like Amazon for example?

In regards to what we found on the power of likes, and making likes visible to create a popularity tunnel - I have found that many different online secondhand platforms utilize this, and I think it, in a way, replaces a common function of "new good" online platform, which is rating. Since secondhand platforms only have 1 or a limited number of each item, there can be no rating system. Rating systems indicate how popular an item is by how many ratings and how good the rating is. Favoriting mimics this popularity mechanism. I have also seen the ability to like on "new good" online platforms as well, but usually, you don't see the number of favorites, it just adds it to your wishlist. I think this is due to the fact that the rating function is already in place.

Why did you include "on day 1" in Hypothesis 3: "Items that have more likes on day 1 will be liked, added to the shopping cart, and sold more often than those with fewer likes."?

In this context, Day 1, refers to the first time you open up the web page and look at the products. My "day 1" is different from you "day 1" depending on what date and time of day we are looking at the products, and therefore the number of likes varies. It could also be rephrased as something like, "on the user's initial encounter".

Could you go into further detail on why the following data points were omitted: link, day-added, product category, size, "Price Drop"-tag, and day-sold?

Link, product category, and size were all descriptive variables and not useful in our statistical analysis, plus they didn't connect to any of the hypotheses. The "Price Drop" tag was removed because basically none of our items ended up having this tag. Day added was removed, because all of our data points were added on the same day, and day sold was removed, because we didn't measure how fast items were selling, but just whether they sold or not.

What is a logistic regression approach?

Logistic regression is a predictive analysis used to describe data and to explain the relationship between one dependent binary variable and one or more independent variables, which can be a mix of both continuous and categorical.

Why was it chosen as the optimal regression model for the categorical dependent variable, sold?

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is binary (categorical, but 2 categories), and sold is a binary variable. The logistic regression makes no assumptions about the distributions of the predictor variables, which can be a mix of both continuous and categorical variables, while the dependent variable is of categorical nature.

You say that "variable resales_price was dropped for multicollinearity." Can you explain this? What is multicollinearity?

Multicollinearity occurs when independent variables in a regression model are correlated, so it's basically another word for correlation. Naturally, our different price variables are to some extent correlated, as you can see in the correlation matrix in table 6, and because correlation can negatively affect our results resales price was dropped. We could've also dropped the retail price variable instead. When you're interpreting the results, you say "holding all other factors constant, variable 'x' has an effect of 'y' on sales", but when some variables are correlated, then you're not able to hold all other factors constant because they then change as well because they're correlated.

Why was the OLS regression model not chosen as the optimal regression model for continuous dependent variables increase_favs and shopping_cart? Why did the data set not meet the optimal conditions for an OLS regression?

OLS regression makes certain assumptions about your data including that your values are normally distributed. To meet this assumption when a continuous response variable is skewed, a transformation of the response variable can produce errors that are approximately normal. Often, however, the response variable of interest is categorical or discrete, not continuous. In this case, a simple transformation cannot produce normally distributed errors. The increase in favs variable and the shopping cart variable counts the number of occurrences of these specific events. The distribution of counts is discrete, not continuous, and is limited to non-negative values. There are two problems with applying an ordinary linear regression model to these data. First, our distributions of count data are positively skewed with many observations in the data set has a value of 0. The high number of 0's in the data set prevents the transformation of a skewed distribution into a normal one. Second, it is quite likely that the regression model will produce negative predicted values, which are theoretically impossible.

How did you collect the data?

On the first day of our data collection, we searched Thredup by each of the four brand names, and information about the 30 first items that popped up on the search results pages was gathered in a spreadsheet (all information gathered about each item is on Table 3 in the Appendix) So, each day over the next week, each item was manually tracked and activity, such as the number of likes, shopping cart activity, and whether it was sold or not, were updated and added to the spreadsheet in the corresponding day it was observed.

What is an OLS regression model?

Ordinary least squares (OLS) regression is a statistical method of analysis that estimates the relationship between one or more independent variables and a dependent variable; the method estimates the relationship by minimizing the sum of the squares in the difference between the observed and predicted values of the dependent variable configured as a straight line.

Explain priming.

Priming is a phenomenon whereby exposure to one stimulus influences a response to a subsequent stimulus, without conscious guidance or intention Example, me telling you its a bad movie before you see it.

How is this relevant to H3: "Shopping cart activity was also included as an independent variable in the sales regression and showed a consistent and positive effect (.438* and .245**). "?

Since favoriting was a level of commitment and the number of favs on day 1 was significant to shopping cart activity and sales, we also commented that shopping cart activity was significant to sales as well. It doesn't have to do with H3 specifically, but how each level of activity leads to another.

How does your study contribute to the knowledge of biases?

Since our study also relied heavily on biases, our findings gave us many interesting insights into these biases. For example, in this study, we found that the Anchoring bias had a strong influence initially (in the first stage of commitment), but wore off as commitment level increased. Also, through testing the significance of variables, we were able to see which biases had a stronger effect over others in the online secondhand shopping context, depending on how significant the variables associated with it were. Our H3 (popular items), which was tied with the Bandwagon effect, conflicted with the "uniqueness" aspect of H2, which was tied with the Oddball effect. Given that H3 was fully supported, whereas H2 was not, it could be deduced that the Bandwagon effect has a stronger effect on consumers than the Oddball effect. These specific aspects would be interesting to delve into further research.

How did you come up with these three stages of commitment?

Since we were only able to observe the products from the consumer-side of the website, these were the three major measurable stages of interest that we could record. These three stages had a logical order of action, which would be first to like or favorite the product to show initial interest, second to add it to cart to show interest in buying, and third to actually buy the product. If we could have been given access to, for example, the website analytics, we could have added a few more stages, such as impressions and click-through rate, as initial stages.

Explain Table 5.

So first, all variables had 120 observations, because we had a sample size of 120 items. Next, basically, the mean is the average of the values we coded for that specific variable. Premium and sustainable items averaged to .5 because exactly half were sustainable and half were premium. For the other categorical variables, all of their means are closer to 0, meaning that most items did not have tags or were not considered unique and most items were also not sold. For the continuous variables, the mean is pretty self-explanatory: the average retail price was a little over $100, the average retail price was a little over $25, the average price gap is a little over $75, the average amount of favorites on day 1 was close to 7, the average increase in favorites was 2-3, and the average amount of times an item was added to the shopping cart was close to 1 time. Next, standard deviation measures how spread out the numbers are. A low SD indicates that the data points tend to be very close to the mean, whereas a high SD indicates that the data points are spread out over a large range of values. For the categorical variables, the SD. is low because the value can only be 1 or 0, however the lower the SD, the more the values were alike, which in these cases, would be 0. Continuous variables dealing with prices had a higher SD, especially retail price and price gap, whereas the variables dealing with shopping cart activity and favs had a lower SD. The minimum and maximum are also pretty self-explanatory. Two things to point out, though. First, for the price gap, its kind of the opposite, so the largest price gap is the minimum because its a negative number, and secondly, there were a few values for the increase in favs variable that was -1, meaning that, by the end of the week, their likes decreased by 1 instead of increasing. We had to change these to 0 since the dependent variable cannot take a negative value in negative binomial regressions.

Why was shopping cart activity also included as an independent variable in the sales regression?

Stages are overlapping for the first stages. As soon as the item gets sold, the other phases do not continue. More logical to leave out.

What are stepwise regressions?

Stepwise regression is a way to build a model by adding or removing independent variables. Stepwise regressions were done for each stage of commitment. We started each test with no independent variables, adding one at a time as the regression model progressed. We don't want to blow over weaker effects. If we would have just done a consolidated regression, the variables with a strong effect would have blown out those with a weaker effect.

Explain Table 6.

Table 6 displays a correlation matrix of our variables. This is a table showing correlation coefficients, which is a measure of some type of correlation, meaning a statistical relationship between two variables, between variables with each cell in the table shows the correlation between two different variables. We use this correlation matrix to summarize data, before going into more advanced analyses. The closer to 1 or -1 the stronger the linear relationship, and the closer to 0 the less of a linear relationship 2 variables have. So based on this table, we can see that Premium has a moderate positive linear relationship with retail price and resale price, and a moderate negative relationship with price gap. Premium also has a weak negative linear relationship with increase in favorites and shopping cart activity. Sustainability has a weak positive linear relationship with favorites on day 1 and shopping cart activity. The "either tag" variable has a strong positive linear relationship with the rare find tag variable and a moderate positive linear relationship with the new with tags variable. Uniqueness has a weak positive linear relationship with favorites on the first day and shopping cart activity. The retail price has a strong positive linear relationship with the resale price and a strong negative linear relationship with the price gap. The resale price has a strong negative linear relationship with the price gap and a weak positive linear relationship with favorites on day 1, increase in favorites, and shopping cart activity. "Favorites on day 1" have a moderate positive relationship with shopping cart activity and a weak positive relationship with the increase in favs. Lastly, the increase in favs has a weak positive linear relationship with shopping cart activity. We used a correlation matrix to check before any regression to see if there's something that could impact our results. We saw a strong correlation between retail price, resale price, and price gap, which is why we decided to drop the resales price variable. We also noticed a strong correlation between either tag and rare find tag, and new with tags. We didn't drop any of these variables because we were still interested to see its effect, so we made sure not to use all of them together in one regression, so we used either both individual tags together or we checked for the "either tags" variable. Technically, we could have done the same for resales price, but we weren't actually interested in the effect of resales price so we dropped it.

Explain Table 7-9 and Figure 2-4.

Tables 7-9 show the frequencies of the number of increase in favorites, the number of times an item was added to cart, and whether an item was sold or not. The tables also display the percent for each frequency and a cumulative percent for each time the variable's value increased by 1. Figures 2-4 show these frequencies in a histogram. In all of these histograms, since the prominent peak lies to the left with the tail extending to the right, this is a right-skewed dataset. In this case, the median is less than the mean of the dataset. We can also see from the histograms that the data is overdispersed. Overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model.

How did you code the data?

The 120 collected observations were manually coded. And from relevant information, for example, not size, we created 13 variables, which are explained and summarized in Table 4. For the variables that could be answered with yes or no, such as is it premium or not, we used binary labels with 1 meaning that it had that characteristic and 0 meaning that it did not. The rest of the variables were number observations, so that is how they were coded.

What differentiates Thredup from normal, "new good" e-retailers?

The main differences between Thredup and a new-good e-retailer are that Thredup only has one or a very limited quantity of each item and is lacking features like product reviews, extensive descriptions, and high-quality visuals with models.

If you could do this project over, what would you do differently?

The main thing I would have done differently is to change how we selected and validated brands. In retrospect, I think that by sampling items from more popular brands, we would see more buyer behavior, which may have resulted in different results, or possibly just further validated the results we came up with and maybe came up with a few additional findings. This also could have occurred if we would have collected data over a longer period of time. Also, the way we validated the perception of brand sustainability could have been better, even if we couldn't have done it in person, we could have done it over video chat, asking that way so that we knew that none of the participants researched prior to answering. It would have taken more time, though.

Why do you use consumer biases?

These biases and phenomena happen to everyone, and cannot completely avoid or eliminate cognitive biases. Therefore, businesses can take advantage of these innate consumer biases, by first understanding how and why they arise and also how they impact on our lives and our work. For example, a business can take advantage of the anchoring bias, in the scenario where an item is on sale. Putting the original price next to the current lower price, consumers may be more inclined to purchase the product without even realizing how this bias is affecting them.

How could you turn this study into an interpretist study?

To turn this into a interpretist study, we would have to integrate human interest into it. So basically they believe that you can only understand reality through social constructs, such as language, consciousness, and stuff like that. They also value qualitative methods over quantitative methods. So we could have done interviews with users of the platform, and coded those interviews looking for shared meanings. Another idea, we could have still done an observational study, but observed how people interacted on the website, recording what items they interact with, and do it that way.

You write, "we also aim to find valuable insights that can provide guidance for both emerging and established businesses in the online secondhand industry regarding strategic and UX decisions." Do you answer this? What have your studies shown that can be used in UX decisions?

User experience design refers to the process of manipulating user behavior through usability, usefulness, and desirability provided in the interaction with a product. When we refer to UX decisions, we are referring to how we talk about the characteristics and functions of an online platform that exploit consumer biases, and thus, manipulating user behavior.

"Performing and discussing multiple regression models to confirm the statistical results, e.g. through log-transforming the data and performing an OLS regression, would add further credibility to our results, however goes beyond the scope and length of this research paper." What does this mean?

Usually when you have more space and time, then good researchers would perform multiple analyses, because you never know which model is the best for your data, so normally you would perform analyses using all different models to see which one is the best fit, has the best r squared. We actually tried in Strata performing the log transformation of the data, to remove skewness from the predictor, and then perform an OLS regressions on Strata, but we faced some issues in doing so, so we went with negative binomial regression.

Why did you choose to do stepwise regressions?

We chose to do stepwise regressions because it gave us the ability to manage large amounts of potential independent variables, and by watching the order in which variables are removed or added, we could find out valuable information about the quality of the independent variables. In order to achieve the most reliable results and to be able to observe weaker coefficients as well, only the control variable and relevant independent variable(s) for each hypothesis were analyzed together. Finally, one consolidated regression with all independent variables was performed for each dependent variable.

How could you have done it qualitatively?

We could have done it qualitatively by interviewing users of the platform, but since we did not know many users, we thought a quantitative method would produce more reliable results. If we did know of any users, it would have been nice to supplement our quantitative findings with qualitative findings and incorporate method triangulation. We could have therefore cross analyzed the two datasets for overlaps and differences, which would have given more validated, reliable results.

How did you determine uniqueness? Why did you determine uniqueness that way? What were the pros and cons of doing so? Is there another way you could have determined uniqueness?

We determined uniqueness by each of us going through every item and determining whether we thought the item was unique or not. If an item was considered unique by 1 out of 2 of us, we coded it as unique. We did it this way, because we thought, since uniqueness is so subjective, it would be best to consider both of our opinions. In retrospect, maybe it makes more sense to have only labeled an item as unique if we both considered it unique, therefore other people would have more likely perceived it as also being unique if we both agreed. But in doing so, this would only minorly affect the results, because we only disagreed a few times out of all the items we coded.

Did you contact Thredup for data? What are the weaknesses of your way of collecting data? What could you have done if you got data from Thredup?

We did not contact Thredup for data. All of our data was gathered through observation based on what any user could see. Since we did not contact Thredup, the data we could record and observe was limited. If we could have had access to website analytic data from Thredup, then we could have included variables such as impressions and click-through rate into our analyses and expanded our levels of commitment to include these as well. Although this would have been interesting, I think our data still resulted in exciting findings.

How do you answer your research question, "What factors influence shopping behavior on online secondhand clothing platforms?"

We identify 2 major biases that influence shopping behavior, risk compensation, and the bandwagon effect, and we link these biases to specific functions or characteristics of a website that in a way exploit these biases to increase buyer behavior. So in this way, we do answer what factors influence shopping behavior on online secondhand clothing platforms.

What was your sample size?

We initially recorded the first 30 products of each brand page, resulting in a sample of 120. By the end of the week, we had a sample size of 169 items, so new items were added, mostly reformation items (28 new Reformation items vs. 4 for A.A., 5 for Ganni, and 2 for Target). When we started doing our analyses, we realized that the items we added were negatively affecting the other values, so we decided to drop the additional observations. For example, we measured the increase in favorites over these 8 days, but the items that were added to the sample later were only analyzed for fewer days so on average also had a smaller like increase, which would have skewed our data.

Would you say that Thredup, "assessing product quality from a rather neutral perspective", if they are selling the clothes?

We meant that they had a system of categorizing items by four levels of condition: new with tags, like-new, gently used, and signs of wear. I believe that Thredup would be honest in assessing product quality because if they were not, they would get a lot of returns and not so many loyal customers. Especially since item quality was deemed a very important factor from our results.

Why did you choose Thredup?

We specifically chose Thredup due to its size, maturity, product, and user bandwidth, and thus to increase the validity of our study. Besides, the fact that Thredup acts as a reseller, taking professional pictures and assessing product quality from a rather neutral perspective ensures a certain level of standardization and conformity and removes possibly confounding factors like photo quality, which would be relevant on a C2C secondhand platform, such as eBay.

Why is a negative binomial regression the optimal regression model for continuous dependent variables increase_favs and shopping_cart?

We use negative binomial regression because our dataset is skewed and overdispersed. The negative binomial distribution is discrete distribution, which makes it useful for modeling count data. This model is better than an OLS regression model for this data because it includes a skew and discrete distribution. A negative binomial model proved to fit well for this data because the frequency for 0 increase in likes and 0 shopping cart activity is so high, but a few data points recorded frequencies much higher, making the variance much larger than the mean. Therefore, the negative binomial model was clearly more appropriate.

Why did you use a quantitative method?

We used a quantitative method because we basically only had access to the data on ThredUp's website, and since there is no review system where we could have gathered observations of consumer opinions and thoughts, we were left to take a quantitative approach through observation and gathering numbers and other measurable factors. It's important to note, though, that observational studies are very speculative, and we had no idea of knowing wether or not our results would yield any interesting findings, so there definitely was some risk in using this method.

What is deductive research logic? Why did you use it?

We used deductive reasoning because instead of developing our own theory, we are testing hypotheses based on already existing theories, which is made up of our background research and consumer biases. We start by making hypotheses that we found compelling and then test its implications with data, moving from a more general level to a more specific one. We would have used inductive research if there was little to no research on the subject.

What were the pros and cons of validating your brand choices over social media rather than in person?

Well, we wanted to reach North American consumers, since Thredup is only available in that region. By reaching out on social media, we were able to reach these consumers, and we would not have been able to in person since neither of us is located in the US at the moment. So the positive thing about doing it this way was that we were able to reach the consumers we wanted. The negative, of course, is that although we only presented the names of the brands, the participants had the ability to research without me knowing. If this is the case, we would have preferred that the participants would not have researched and would have either guessed based on previous knowledge of the brands or have just said that they didn't know the brands. This is important because users of the platform are probably not going to research all the brands that they find to see if they are sustainable or not. Therefore, doing this validation in person and presenting people with the brand names written on cards would have been better, but impossible in our case.

Can you explain how the average online secondhand shopper has changed, and that the factors influencing shopping behavior differ from the traditional perception of secondhand shopping behavior?

What we meant to say with that is that the average secondhand shopper is evolving with the introduction of online secondhand platforms, and maybe it's a new set of secondhand shoppers with a different set of values and motivations than traditional, brick-and-mortar secondhand shoppers. I think the two ways of secondhand shopping offer two totally different experiences that would appeal to different shoppers. For example, if you were looking for something specific, maybe the online shopping platform would be more appealing to you where you can filter based on various criteria, whereas if you just wanted to browse, maybe real-life thrift shopping would be more appealing, where you can feel and touch the clothes and find something very unique. These are all assumptions, but it would be interesting for future researchers to delve into this.

Explain the oddball effect.

a person is more likely to remember a unique stimulus that is presented amongst a stream of homogeneous stimuli For example, you are looking at products and the majority are white t shirts, but then you come across a colorful graphic tee, you are more likely to remember that or spend more time on it.

Explain the bandwagon effect.

a phenomenon whereby the rate of uptake of beliefs, ideas, fads and trends increases the more that they have already been adopted by others. For example, in fashion, many people begin wearing a certain style of clothing as they see others adopt the same fashions.

Explain risk compensation.

people typically adjust their behavior in response to the perceived level of risk, becoming more careful where they sense greater risk and less careful if they feel more protected. People with low risk tolerance, are going to precede with things, like driving for example, with more caution than people with higher risk tolerance, because they feel less protected while driving.

Explain the anchoring.

where an individual depends too heavily on an initial piece of information offered (considered to be the "anchor") to make subsequent judgments during decision making For example, if you first see a T-shirt that costs $1,200 - then see a second one that costs $100 - you're prone to see the second shirt as cheap.


Related study sets

Strategic Management Chapter 4 Quiz

View Set