We consider the problem faced by a company selling a product with warranty and under partial information about the product reliability. The product can fail from multiple failure types, each of which is associated with an inherently different repair cost. If the product fails within the warranty duration, then the company is required to pay the repair cost. The company does not know the probabilities associated with different failure types, but it learns the failure probabilities as sales occur and failure information is accumulated. If the failure probabilities turn out to be too high and it becomes costly to fulfill the warranty coverage, then the company may decide to stop selling the product, possibly replacing it with a more reliable alternative. The objective is to decide if and when to stop. By formulating the problem as a dynamic program with Bayesian learning, we establish structural properties of the optimal policy. Since computing the optimal policy is intractable due to the high dimensional state space, we propose two approximation methods. The first method is based on decomposing the problem by failure types and it provides upper bounds on the value functions. The second method provides lower bounds on the value functions and it is based on a deterministic approximation. Computational experiments indicate that the policy from the first method provides noticeable benefits, especially when it is difficult to form good estimates of the failure probabilities quickly.