Towards practical robotic chef: Review of relevant work and future challenges

Robotic chefs are a promising technology that can improve the availability of quality food by reducing the time required for cooking, therefore decreasing food's overall cost. This paper clarifies and structures design and benchmarking rules in this new area of research, and provides a comprehensive review of technologies suitable for the construction of cooking robots. The diner is an ultimate judge of the cooking outcome, therefore we put focus on explaining human food preferences and perception of taste and ways to use them for control. Mechanical design of robotic chefs at a practically low cost remains the challenge, but some recently published gripper designs as well as whole robotic systems show the use of cheap materials or off‐the‐shelf components. Moreover, technologies like taste sensing, machine learning, and computer vision are making their way into robotic cooking enabling smart sensing and therefore improving controllability and autonomy. Furthermore, objective assessment of taste and food palatability is a challenge even for trained humans, therefore the paper provides a list of procedures for benchmarking the robot's tasting and cooking abilities. The paper is written from the point of view of a researcher or engineer building a practical robotic system, therefore there is a strong priority for solutions and technologies that are proven, robust and self‐contained enough to be a part of a larger system.

by decreasing the number of chores done by people daily, as well as bring more healthy, tasty food into people life's as the time cost of cooking gets removed.It can also make restaurant operations cheaper, therefore increasing the number of available options and affordability of eating out.
Despite the magnificent progress of technology and its infiltration into more and more parts of our lives, cooking and food preparation is still a low-tech affair.Most meals are still prepared in home kitchens with the use of manual tools (Taillie, 2018).These tools have been known for centuries and only ergonomic and production cost improvements have been made to them since then.
While various kitchen gadgets continue to promise productivity increases, they do not enjoy wide adoption, possibly due to their task-specific nature.Therefore the time spent on cooking remains a significant cost (Jackson & Viehoff, 2016;Wood, 2017).For an average person, even living in well-off countries, the only two choices when lacking time for cooking are either delegate food preparation to factory automation or other people (Wolfson et al., 2016).The first option involves buying mass-produced meals from a factory that, by the requirements of mass production, is located far away from the consumer.The result is preparing the food in a way that enables its transport for long distances and storing it on the shop shelves for a long time.The outcome is the necessity to freeze the meals, replace the natural flavor with the addition of fat and sugar and overall reduction of the meal quality (McGowan et al., 2017;Tian et al., 2016).
The second option-delegating cooking to other people-is usually done by visiting canteens, and restaurants or using food delivery apps (Kowalczuk & Czarniecka-Skubina, 2015).The main problem with this choice is a cost strain affecting both the cook and diner's side.For a cook, the commercially viable types of food are limited by the necessity of a large number of people demanding that particular people in the area, as well as a limited number of options in a menua step necessary to streamline restaurant operation.As a diner, the choice is limited by the area you live in and the cost of eating out (Wolfson et al., 2016).Therefore, it becomes obvious that a productivity increase in cooking that does not affect human health must be done in a much more decentralized manner than in a factory, which brings the robotic chef as a solution.
The paper makes some broad assumptions when choosing the topics to discuss.We assume that the robotic chef works by interacting with the environment to produce a dish that will be palatable for a specific diner that will eat it.This statement defines the knowledge required for future robotic chef engineers.Understanding the human perception of food is necessary, as this determines the outcome of a robotic chef's work.Furthermore, such an engineer needs knowledge of how to construct a robotic chef and use robots for manipulation tasks.Making the robotic chef more adaptable will also require the implementation of sensing-most likely taste and vision, as well as machine learning.Finally, the robotic chef engineer will face the challenge of objective measurement of some of the robotic chef's senses-notably taste-and the chef itself.
Finally, knowing the robotic chefs' place in the economy would be beneficial.
The main goal of this publication is to present a comprehensive review of the latest robotic chef platforms, how they are built, their performance and their perception by the public.We also present various technologies useful for the construction of robotic chefs.The presentation of these technologies does not focus on the most recent or popular aspects, but on the aspects already shown working as part of a robotic chef or at least judged by the author as robust enough to be suitable for integration as part of a robotic chef.For the needs of this review, we categorize technologies and topics regarding the robotic chef into three major categories-well-funded research, emerging technologies and future challenges.They are consecutively discussed in Sections 2-4.

| Human experience
The quality of a cooked dish is ultimately judged by a subjective diner.This judgment depends on the perception of the dish, rather than an objective measurement.A large number of factors can influence the diner's perception.These factors vary from geopolitical influences, through tradition and family customs to individual preferences formed by unique experiences.All of these need to be taken into consideration when preparing a meal, and a robotic chef needs to take all of them into consideration to cook a meal that will be considered tasty.In this section, we review many of these influences that emerge from social factors and group dynamics.

| Social environment and perception
Perception of a particular food as desirable and, in many cases, its physical availability depends strongly on geographical localization and local regulations.Major sustenance styles are a popular example where geography and climate determine major nutrition sources (Lobell et al., 2011).These sustenance styles will then determine the diet and cuisine, as it is a logistical impossibility that a group of nomadic shepherds and farmers would eat the same dishes.Even small differences between sustenance styles are known to create huge cultural differences (Talhelm et al., 2014).The statewide regulations also make a huge impact on available ingredients and people's behavior.An example of such is the sugar levy reducing soft drink consumption by 18% and making manufacturers reformulate most of them (Dickson et al., 2023).The family environment is also an important factor in food perception.Food aversions are a very frequent family influence, with 38% of people experiencing food aversion in their life, with an average onset age of 16 and an average length of 11 years (Mattes, 1991).The social arrangement for eating is also strongly influential (Herman et al., 2003).For example, people tend to eat more when in a group rather than when eating alone (Pliner et al., 2007).This social facilitation is present also in nonhuman animals (Tolman, 1965).Stereotypes also play an SOCHACKI ET AL.
| 1597 important role in the perception of food palatability (Vartanian et al., 2007), and the effect of social impression is strong (Goldman et al., 1991).More research is needed to decide whether a robot can have the same effect (Jie & Gunes, 2020;Niewiadomski et al., 2022).
Finally, it is customary to match certain dishes together, and some algorithms for this are being tested (Nishimura, Ishiguro, et al., 2022).

| Individual preferences and acquired taste
Individual humans have personal preferences and also very different nutritional requirements (Association et al., 2006;Government Dietary Recommendations, 2016;National Institutes of Health & Others, 2020), that also change with physical activity (Onywera et al., 2004).Individual preferences are prone to classical conditioning.For example, pairing a taste with sugar causes a lasting preference for that taste even when the sugar is removed (Zellner et al., 1983).These preferences also depend on a sheer number of encounters with certain foods or tastes.Children are known to dislike new foods and tastes until they experience them multiple times (Birch et al., 1987).Even when neophobia is overcome, more exposure still increases the "liking" response, what is called a mere exposure effect (Pliner, 1982).Moreover, age strongly affects taste perception.The most significant effect of age difference is measured for the intensity of sweetness perception (Zandstra, 1998).For example, the young (years 8-12) group is tolerant to very high concentrations of sugar in drinks, while older groups tend to dislike these high concentrations (De Graaf & Zandstra, 1999;Desor & Beauchamp, 1987).

| Safety and acceptability
Robotic chefs' cooking also needs to be perceived as safe and desirable to enable practical implementations.Research conducted in South Korea (Song et al., 2022) shows that timely and robust performance is crucial for building a positive image of robotic chefs.
Trust towards the service robots is known to increase client satisfaction (Seo & Lee, 2021).Tourists interviewed about robotic chefs respond with positive opinions and show interest in visiting such a restaurant, but also claim that it depersonalizes the experience of dining (Fusté-Forné, 2021).Another study (Hwang, Park, et al., 2020), shows that hedonistic, cognitive, and functional reasons convince people to accept robotics chefs.The visual appearance of the robot also matters, and humanoid robots are perceived as better cooks (Zhu & Chang, 2020).Robots also have some advantages over humans, as they are less blamed for eventual bad performance by customers (Leo & Huh, 2020).On the other hand, high capital costs and privacy risks appear in customer surveys (Hwang et al., 2021).Some focus groups also point to a possible negative impact on the labor market (Zemke et al., 2020).We summarize all of the opinions discussed in this section in Table 1.
Safety is also an important part of the perception of a robotic chef.Interviewing customers and restaurant managers has shown that robotic chefs are expected to increase hygiene standards (Seyitoğlu et al., 2021).Some robotic systems come with reflex systems that quickly react to hot items and preserve the robot from harm (Junge et al., 2022).Sensorization can also improve the safety of the food cooked by controlling the cooking process or checking ingredients for adulteration (Andrychowicz et al., 2020;Tian et al., 2019;Valente et al., 2018) and quality (Dias et al., 2014;Qiu et al., 2015).Robots can also help keep hygiene up by loading dishwashers (Voysey et al., 2021).Cameras, combined with machine learning can also be used for making the robot safe around people by tracing them (Mohammadi Amin et al., 2020) and other objects (Petit et al., 2017).Also, visual detection of allergens is an engineering possibility (Mishra et al., 2022).

| Food handling and manipulation
Cooking requires handling various ingredients and tools, many of them delicate.Human chef uses a variety of manipulation types including pouring/spreading, cutting, peeling, opening, mixing, grasping/holding, cracking, cleaning, and so forth.Each of these will require a different approach depending on the manipulated object and environment it is in.Two approaches are used to tackle this problem.One is introducing structure to the environment, which could be realized by attaching tags to the objects, manufacturing tools that are easier to grasp or providing the robot with precut and prepacked ingredients.While this may seem like a special treatment for the robot, it is a common practice in commercial kitchens.The other approach assumes that the robot should be able to work in any environment that humans would be able to work in, including ingredient variability and the presence of many extra produce, tools, and items that are not a part of the task.The first approach is usually one used in industry, while the second was never fully realized, but pushing the systems in this direction makes it more flexible.In this section, we will discuss how robotics chefs can be pushed in that direction.For systems that aspire to do this dextrous and delicate grippers are required.Moreover, some of the properties of dishes and ingredients are unobservable from the outside therefore, the grippers also have an important sensory function.The sensory tasks may include tactile sensing, stiffness measurement, temperature, and many more.On top of that suitable algorithms are needed to find the path to the products and find the best way to grasp them.In this section, we discuss these challenges.

| Grippers and tactile feedback
One approach to the design of grippers for food handling is to make them soft and compliant.We can see examples of such grippers in Figure 1a-d.This effect can be achieved in various ways, for example, Hughes et al. (2020) proposed a fully pneumatic design, where both T A B L E 1 List of gathered opinions and relations found in various research on the perception of robotic chefs.| 1599 actuation and tactile sensing are done using pneumatics.In Figure 1b we see a similar approach, where a gripper with a pneumatic tactile sensor and proprioception is used to measure and assess food readiness (Sochacki, Hughes, et al., 2022).A similar task of assessing mango ripeness was also approached with multiple tactile sensors (Scimeca et al., 2019), with the system shown in Figure 1e.A similar, yet simpler system of this type was shown to assess the firmness of an eggplant (Blanes et al., 2015).Fin Ray grippers (Crooks et al., 2016) are also widely used for fruit harvesting (Zhou et al., 2021) and can do fruit ripeness assessment (Almanzor et al., 2022) when fitted with electro-impedance tomography kit (Bera & Iriguchi, 2014).Design and testing of a similar gripper and its testing on various fruits are shown by Liu et al. (2018).Acoustic methods can also be used for ripeness assessment (Macrelli et al., 2013).Some more bulky objects found in kitchens need a more firm approach to grasping.For example, lettuce can be peeled robotically using suction cups (Hughes et al., 2018), but more sturdy cage-like grippers are used when harvesting them (Birrell et al., 2020).Another approach is to use hard components with multiple joints, that create a semisoft structure.For example, Gafer et al. (2020) have built a gripper that handles objects delicately by scooping them with fingers that form a "cage" around the grasped product.Rigid grippers can also be padded with soft material that cushions the delicate produce before contact sensors stop further closing of the grippers, and allow handling soft produce like tomatoes (Russo et al., 2017).Finally, some grippers incorporate sources, spiciness, and sweetness sensors at their fingertips (Ciui et al., 2018).
As an important note, it should be mentioned that all of these grippers are more or less task-specific, and cannot be as robust as a human hand.Research into human-like robotic hands has yielded some impressive results like shadow hand mastering the in-hand manipulation (Andrychowicz et al., 2020) and passive hands grasping objects (Nonaka et al., 2023).Regardless of that, we judge these solutions as too complicated and delicate to suit an integration of a robotic chef as they bring too much complexity when being just one of many parts in a system.

| Manipulation algorithms and learning
One way to perform complicated manipulation with a robot is the use of teleoperation, where a human can control from far away.This approach was used, for example, to decorate cakes (Bolano et al., 2019).Another way to make the robot imitate manipulation done by a human is to record its moves.For example, a combination of both kinesthetic and demonstration learning was shown to learn pan-flipping (Ko et al., 2022).Some systems even use haptic information input for the learning algorithm (Sundaresan et al., 2022).
Another approach is using machine learning to generate a policy that will govern the movement of a robot based on the current situation.
Sometimes different parts of the same task require different manipulation techniques, and this idea was used for programming robotic arms for the task of scooping (Bhattacharjee et al., 2019).Later this idea was extended to a two-arm robot using the Markov Model (Grannen et al., 2022).Moreover, a neural network was trained to plan dough stretching tasks based on a pointcloud (Qi et al., 2022).Although other techniques were too slow to do that task in real-time and used a look-up table generated before (Kim, Ruggiero, et al., 2022).On top of that, a convolutional neural network (CNN) that can plan and execute food skewering was trained (Feng et al., 2019).Deep Imitation Learning was also used to teach robots handling bottles (Kim et al., 2023) and peeling bananas (Kim, Ohmura, et al., 2022).Handling pans and woks was also learned by tracking human chef movements and training a transformer network to coordinate two robotic arms (Liu et al., 2022).
Manipulation based on digital models was also attempted.Robotic cutting of avocados with a strain gauge sensorized knife was explored.
The tactile data were fed into various algorithms, including a variation of nearest neighbor, reinforcement learning, and nonadaptive algorithms (Xu, Xian, et al., 2023).Simpler movements, like, slicing, are done using physical models (Mu et al., 2019).Moreover, in an experiment, tactile sensing was able to predict the maximum grasping force that does not damage food produce (Ishikawa et al., 2022).

| Integration
The current integration of robotic chefs comes in different shapes and forms.Some systems seem to start from the factory automation style of machines coming up with state machine type of system with limited sensing and manipulation (Inagawa et al., 2021;Spyce Ltd., 2021).These are built to cook one dish specifically and are limited by the fact that most actuation is one degree of freedom (DOF).We can see examples of these in Figure 2b, e.
The other approach, viewed by the author as superior and most scaleable, is to use a robotic arm with 6-or 7-DOF.This way is more complex but also allows much more capability.This approach was used for tasks like salad assembly (Dexai, 2021), making hot dogs (Zabka, 2022), frying sausages (Mauch et al., 2017), making scrambled F I G U R E 1 Collage of robotic grippers especially useful for food handling and sensing.(a) Sensorized pneumatic gripper handling a potato chip (Hughes et al., 2020), (b) sensorized gripper able to assess cooked-ness (Sochacki, Hughes, et al., 2022), (c) gripper using conductance measurement to assess ripeness (Almanzor et al., 2022), (d) soft gripper for fruit harvesting (Zhou et al., 2021), (e) tactile feedback to assess mango ripeness (Scimeca et al., 2019), (f) vacuum cup gripper peels lettuce (Hughes et al., 2018), and (g) gripper with chemical sensor at its fingertips (Ciui et al., 2018).
eggs (Sochacki et al., 2021), and attempts on general-purpose robotic The last popular approach is to use a humanoid-shaped robot with two arms.This approach adds complexity to the system but makes some tasks easier due to the ability to use the second hand (e.g., it can hold a bowl in place while mixing).The example cooking done with this kind of robot is pancake cooking (Beetz et al., 2011), and decorating cookies (Bolano et al., 2019).Additional DOF makes these systems more expensive, but some untested designs try to make this affordable (Jiang & Zhou, 2022).
We would also like to note the "additive approach" to cooking, which was shown the capability to produce personalized dishes (Mizrahi & Zoran, 2023).Three-dimensional (3D) printing is one of the emerging trends in additive manufacturing.For example, 3D printing of the chocolate enables shapes with more detail than molding (Chachlioutaki et al., 2022).Varying the infill type, different textures a toughness of chocolate can be achieved (Mantihal, Prakash, & Bhandari, 2019).Using appropriate additives can also differentiate products of 3D printing (Mantihal, Prakash, Godoi, et al., 2019).Three-dimensional printing of nonmeltable food like meat (Handral et al., 2022) is also being researched.Similarly, 3D printing of cakes was attempted where the dough was extruded by the 3D printer and the resulting product was then baked (Guénard- hardware, as well as their advantages and disadvantages.We show a short digest of these in Figure 3. Electrochemical sensors-These sensors use various electrodes that are placed in the sample and their voltage is then compared to the reference electrode.These electrodes can be made selective by covering them with various membranes (Kobayashi et al., 2010).Few commercial versions of these sensors are available (ASTREE, 2022;Insent, 2022).These were used for various cooking-related tasks like tracking the composition of chicken (Liu et al., 2017), mushroom taste evaluation (Phat et al., 2016), detecting honey adulteration (Oroian et al., 2018), classification of oils (Dias et al., 2014), and many more (Andrychowicz et al., 2020;Ouyang et al., 2013;Saha et al., 2019;Valente et al., 2018).The crucial limiting factor is that these sensors can be used only on liquid samples (e.g., chicken needs to be homogenized into a pulp to be measured).The achievable measurement frequency is also rather small.
Conductance sensors-This type of sensor works by sending an AC signal of known voltage between two electrodes and measuring the resulting current (Ramos et al., 2008).These types of sensors were used for various tasks, including the detection of milk adulteration (Sadat et al., 2006).These sensors are very sturdy and suitable for applications even in harsh conditions (Benjankar & Kafle, 2021;Gorji et al., 2017).They were also proven to work on some solid-state foods (Sochacki, Abdulali, et al., 2022;Sochacki et al., 2021).
Visual sensors-Visual observation at various wavelengths can reveal a lot about taste.An optical sensor with spectrum analysis was used to differentiate between food produce (Shimazu et al., 2007).
For example, cooking progress can be estimated (Mauch et al., 2017), and cooking time can be adjusted based on the looks of ready dish (Junge et al., 2020) The major advantage of such systems is the ability of nondestructive sensing from a distance.
Biosensors-Biosensors consist of biological sensing component and electronic transducer (Pandey, 2019).These sensors are untested in robotics applications yet, but they have a unique ability to target single particles (Zhang et al., 2020), which gives the best possible selectivity.Specifically targeted particles of interest include caffeine (Xu, Zhao, et al., 2023), sucrose (Song et al., 2014), and capsaicin (Xiao et al., 2021).They can also closely reproduce human taste receptors for tasting sweetness (Jeong et al., 2022), umami (Huang et al., 2019), and saltiness (Jing et al., 2023).It must be noted that the long-term stability and working temperature range of these sensors are not known.
Ionic polymer metal composite (Zhao et al., 2023) sensors were also used for E-tongue construction (Das et al., 2023), which was then used to detect milk adulteration (Pal et al., 2023), but their selectivity is unclear, measurement times too long, and their use for robotic chefs is questionable.Other types of sensors can be used too (Yamaguchi, 2021).Combining sensors-while measuring the food directly was discussed extensively, indirect sensing is also very useful.For example, gas sensors could be combined with taste sensing (Li et al., 2023;Liu, Fan, et al., 2023;Ma, Shen, et al., 2023;Tian et al., 2019), or a large battery of sensors to study taste differences between chicken and hen meat (Lee & Kim, 2021).Sensor fusion gives the best resolution and variety of measurements, but is complicated and has not yet used by robots.

| Feedback
Taste is an extremely important measure of the robotic chef's output as well as a source of feedback for learning and adjustments to the variable environment.In this section, we list the previous attempt at implementation of this task.
The first approach is to do that with the robot's onboard sensors.
A recent study (Sochacki et al., 2021) showed robotic chef cooking as a closed-loop task.It is done by sampling a human-cooked dish, and storing this measurement as a goal for a control loop, while the robot cooks dishes on its own, evaluates them, and hence adjusts the cooking process to close the gap between human and robot-cooked food.This was expanded on by using a battery of salinity, pH, and temperature sensor (Shi et al., 2023).
Another way to get feedback on dish quality is to ask people for scoring them.This idea was used by Junge et al. (2020) to find optimal parameters to cook an omelette.Designing a scoring process is still more art than engineering, but in this case scoring taste, texture, and looks on a point scale was sufficient.Extensive studies in these topics are available (King et al., 2013).
Although not yet implemented into robotic chefs, human facial expressions can be used on feedback on taste (Chen et al., 2020;Dibeklioğlu & Gevers, 2018) and classify the expressions into negative, neutral, and positive.

| Modeling
Previously discussed feedback techniques worked on ready dishes only, then influencing the cooking procedure.The ultimate goal of taste perception is to enable predictive control so the feedback loop can be applied during cooking, therefore reducing food waste during training considerably.For this modeling of taste and effects on cooking is necessary.
Simple model of taste-One of the most popular ways to model human taste is to simplify it to five basic tastes: sweet, sour, salty, bitter, and umami (Yarmolinsky et al., 2009).In this model, each of these basic tastes is determined by the concentration of a certain substance or set of substances, for example, NaCl is one of the substances influencing saltiness (Liu et al., 2013), H+ is the key factor in sourness and quinine is one in factors deciding bitterness.Taste buds translate this concentration into electrical signals and transmit this information to the brain.In this setup, the taste is changed by changing the amount of these substances, where some of them are perceived as more intense than others (Yamaguchi et al., 1971).This perception is still up to all the effects specified in section 2.1.
Thermal processing-human food perception includes also temperature, texture, and tactile information.To control these the robot needs to understand and model thermal processing.Microwave cooking is popular since the 1960s and is one of the methods easiest to simulate numerically (Chandrasekaran et al., 2013;Ohlsson & Bengtsson, 1971).This is because most of this process is governed by Maxwell's equations and Lambert's Law, which are generally used as electromagnetic equations.These two give rise to various models (Curet et al., 2006;Geedipalli et al., 2007;Liu et al., 2005;Oliveira & Franca, 2002;Taher & Farid, 2001).In practice, exposure to microwaves is rarely resulting in evenly distributed heating (Birla & Pitchai, 2017).This is due to factors like dielectric constant, thermal conductivity, as well as size and localization of a dish (Oliveira & Franca, 2002;Ryynänen & Ohlsson, 1996;Zhang & Datta, 2000).
Airflow, mass transfer, and surface radiation also have a significant effect on the food cooked (Verboven et al., 2003).
Boiling can also be modeled using heat transfer equations (Paciulli et al., 2018), as well as traditional ovens (Blikra et al., 2019).A similar model was already used in robotic chefs to predict the required cooking time (Sochacki, Hughes, et al., 2022).We also have some studies that could model the effects of heating type on food healthiness (De Pilli & Alessandrino, 2020;Nandasiri et al., 2023).As we see many models of cooking are available and we think they will be a crucial part of the robotic chef self-correcting.

| Computer vision
In the field of robotic chefs, computer vision is used in three major roles-feedback, learning from humans by observation, and finding the ingredients in unstructured environments.The use of these methods becomes progressively easier due to many publicly available models that can be easily integrated with or without retraining.

| Feedback
Feedback is one of the uses of computer vision.For example, the robot that peeled lettuce used two stages of color segmentationone to detect lettuce and another to find its stem and adjusted its movements accordingly (Hughes et al., 2018).The sausage frying robot used a simple color threshold to assess sausage readiness and color segmentation to find the sausage's position on the grill (Mauch et al., 2017).A humanoid cooking robot used the images from a headmounted camera to recognize containers, pans, and pots and acted based on this information (Watanabe et al., 2013).The stir-fry cooking robot used visual feedback of contents deformation to adjust the movements in real-time to achieve the desired stir-fry effect (Liu et al., 2022).A CNN was trained to assess the esthetics of food arrangement on a plate and paired with an algorithm that was used by a robot to rearrange the food on a plate (Nagahama et al., 2022).
CNN was also used to assess the cooking state of various foods (Salekin et al., 2019).

| Learning by demonstration
Many robotic chefs use computer vision for learning.For example, OpenPose was combined with a depth camera to extract the timing of actions from human chef (Danno et al., 2022).Another robotic chef combines pose detection with object detection, computes the correlation between the wrist movement and movement of produce, then uses hidden markov model (HMM) to extract a recipe from the demonstration and uses the recipe to prepare a dish (Sochacki, Abdulali, Hosseini, et al., 2023).A significant part of these systems was a pose detection neural network, that returns locations of major joints in a video.Many such pretrained systems are publicly available, for example, OpenPose (Cao et al., 2019), DCPose (Liu et al., 2021), and DensePose (Guler et al., 2018).

| Object finding
Using computer vision to localize food products is essential for a robotic chef and it stems from agricultural robotics.Computer vision was a part of many automatic harvesting systems.For example, eggplants (Hayashi et al., 2002) and tomato (Feng et al., 2018) were detected using color segmentation.Similar systems made it to robotic chefs too, for example, a humanoid cooking robot used the headmounted camera to recognize containers, pans, and pots based on their edges (Watanabe et al., 2013).Object tracking is also useful for robotic chefs and RGBD cameras were used for tracking pizza (Petit et al., 2017) and apples (Nguyen et al., 2014).
Three machine-learning tasks are usually used as models for computer vision in food-handling robots-classification, detection, and segmentation.The classification task gets images as input and returns the class for that image.Class choices differ from application to application, with the most basic setup being classification between images containing food and those not containing them (Ragusa et al., 2016).Usually, CNNs are used for this task and the number of classes can be increased (Attokaren et al., 2017).Retraining is also done for classification, for example, ResNet-50 was retrained for food classes (Ciocca et al., 2017b).Training the neural networks for classification tasks requires specific data sets, for example, Food-101 data set (Bossard et al., 2014), which contains images of 101 dishes with 1000 images each.Many models were trained using this data set (e.g., Attokaren et al., 2017;Martinel et al., 2018).Pittsburgh fastfood image data set (Chen et al., 2009) (Wang et al., 2021).
Retraining these models for different classes is possible, for example, YOLOv3 was retrained to detect lettuce (Birrell et al., 2020), or YOLOv5 was retrained to detect objects containing allergens.Training neural networks for detection requires data sets labeled with bounding boxes.
One data set available for food detection is Allergen30 (Mishra et al., 2022), which focuses on foods containing allergens.
Finally, segmentation assigns each image pixel to one of the object classes, rather than returning just a bounding box.This approach was shown to work on broccoli using feature extraction coupled with a Support Vector Machine classifier (Kusumam et al., 2016).Data sets used for this task need to be labeled by drawing polygons on the images that accurately cover the areas of the picture corresponding to the object classes.UNIMIB2016 (Ciocca et al., 2017a) is one such data set, consisting of images taken in a real canteen.The 1027 canteen trays with 3616 food instances belonging to 73 food classes are included in this data set.Other data sets use the single ingredients as a segmentation, for example, FoodSeg103 (Wu et al., 2021) has 103 categories of ingredients labeled on around 9400 images.
Data sets made for training models for different tasks also exist.
For example, Nutrition5k (Thames et al., 2021) is a set of images of food labeled with their nutritional value and can be used to train models that do so.ISIA Food-200 (Min et al., 2019) is a data set of images labeled with both the dish type and ingredient list.In all, 197,323 images with 200 categories as well as 319 visible ingredients are contained.ISIA Food-500 (Min et al., 2020) is an extension of this data set containing 500 categories and 399,726 images for further food recognition research.We are providing a list of some foodrelated data sets for computer vision with a note that the list is not exhaustive and we recommend additional research for those who prepare to do research in this field.The list is in Table 2.

| Recipes understanding and learning
Understanding recipes and translating them into a list of programmable actions is another place where machine learning is used.Some attempt to simply make a graph out of many recipes available online (Mizrahi & SOCHACKI ET AL.
| 1605 Shahaf, 2021) and structure available data.Generative Adversarial Networks take it a step further and translate recipes into a Python code (Papadopoulos et al., 2022), but the code is only a sequence of basic kitchen tasks.Similarly, a genetic algorithm can generate new recipes (Antô et al., 2020).Some models can estimate the recipe from an image (Chen & Ngo, 2016;Salvador et al., 2019).Perhaps models that understand recipes, even more, can be trained based on data sets like Recipe1M (Salvador et al., 2017) that contain images, recipes, and ingredients as a single package.Robots can also use Bayesian optimization on human feedback to adjust parameters in already known recipes (Junge et al., 2020).Robotic Chef was shown to translate a recipe into actions using a hard-coded text analysis logic and follow them (Bollini et al., 2013).
Video captioning algorithms were also used to extract recipes from an uncut video (Nishimura, Hashimoto, et al., 2022).This sort of system can also be made using off-the-shelf vision models and implemented into a robotic chef system (Sochacki, Abdulali, Hosseini, et al., 2023).Finally, Large Language Models (LLMs) and a powerful, general-purpose tool that could be used for recipe processing.It was shown that LLM can rewrite the recipe to make it easier to understand (Hwang et al., 2023).Even older LLMs like GPT2 managed to generate recipes from a prompt containing ingredients and recipe titles (Lee et al., 2020).These models were not yet integrated into robotic chefs, which we consider to be the obvious next step in development in this field.

| Modeling of food and cooking
Finally, modeling the effects of actions taken by a robot is a challenge, yet it is a prerequisite for highly adjustable, creative cooking.Problems reducible to simple physics, like movements of the wok were modeled as 2-DOF movement and simulated for various parameters (Ko & Hu, 2020), but their effects on food remain not investigated.Perhaps a model could be trained to do so, with some data sets that label results to cooking action already available (Shirai et al., 2022), or mark a list of ingredients at each step (Zhang et al., 2022).Some progress towards it was made, with models that can generate an image based on a recipe (Zhu et al., 2019).One robotic solution is able to model the effects of boiling on a grasped object (Sochacki, Hughes, et al., 2022) and predict its stiffness after an arbitrary cooking time.Another approach is to have the robot move food around at random or "play" with it to learn its representation (Sawhney et al., 2021).Reinforcement learning was used for cooking-related tasks.For example, Kormushev et al. (2010) used it to improve the motion of a 7-DOF robot for a pancakeflipping task.Flipping pancakes in the air and catching them with a real frying pan was made possible this way.

| From sensing to tasting
As discussed in the previous section sensors need a perception to become taste.Finding or designing this complicated relationship is not easy.An unobvious correlation in this area also frequently exists, for example, Electronic nose was able to determine coffee acidity from the odor of roasted beans (Thazin et al., 2018).This shows that no relationships here are simple and linear as taste is ultimately subjective and hard to define.Currently, a whole battery of sensors is needed to find the correlation between the measurements of a sample and its taste (Kobayashi et al., 2010).The battery of sensors can also estimate the concentration of substances crucial for human taste (Rodriguez-Méndez and Medina-Plaza, 2014), going as far as analysis of cheese composition (Lipkowitz et al., 2018).
One of the most important uses of taste is to differentiate between foods to aid feeding decision-making.This is in its essence a classification task, that a commonly reported for electronic taste.It was also shown that imitating chewing can increase the sensor's ability to distinguish between foods (Sochacki, Abdulali, et al., 2022).This is because different foods react differently to mechanical processing, for example, it has no effect on yoghurt, but tomato release plenty of juices when bitten.Such classification, after acquiring data from sensors, can be done using many algorithms.
One comparison of these algorithms can be found in Zhang et al. (2019).
It was also theorized that human taste can be reproduced using a learning robot that produces its energy from organic matter using microbial fuel cells (MFCs) (Sochacki, Abdulali, Cheke, et al., 2023).As the MFC has chemical requirements as comes to its feed, the robot can learn what it likes depending on the current output of the cell.

| Measuring taste
It is a common requirement for studies for the results to be measurable and reproducible.Similarly, the industry also needs these for quality assurance.In this section, we will discuss the possible ways to measure the taste and compare two sensors or systems.
First, the comparison is usually done between sensors of one type.In the case of taste, it would not be useful to use "conductance" or "temperature" as a category.We would expect something more like a "saltiness sensor" that is to reproduce the subjective saltiness experience of a human.Therefore, the parameters like sensing range would need a readjustment as it now matters with the relation to a range of human sensors.Therefore while the sensing range is normally given by a pair of numbers, the minimum and maximum values that can be accurately measured(usually within 5% of real value) by the sensor: where the x is the measured value and subscript s corresponds to a sensor.In the case of taste sensors, we are not necessarily interested in any measurements that would be made outside the range where the change of the measured value makes no difference to human subjective experience.We can name this set of values a Human Range (HR), which is specific to each basic taste and substance combination.We can describe it with the following equation: where h corresponds to human sensing.Now we want a measure the coverage of the values possible to be experienced by the human.We call this new measure Human Range Coverage (HRC) and we can write it with the following equation: where subscript h signifies a human, and s signifies a sensor.It takes values between 0 and 1, corresponding to the fraction of the human sensing range the sensor covers.This measure is still not perfect as being able to sense different parts of the human sensing range is not equally important.Each taste measured usually has a pleasant range in the middle of the human sensing range, going further away, the pleasantness falls, finally becoming more and more unpleasant.
Therefore, if the taste sensor is to be used for cooking, the most important part of the range is in the middle part of the human sensing range, where the pleasant values lie.This is because any values far away from these middle values warrant a significant change in a recipe, which does not require fine sensing.On the other hand, when coming closer to these middle values fine sensing becomes more important as that is where a precise adjustment of recipes, cooking, and control happens.Therefore we propose an improvement to the HRC by scaling it with a cosine function.We choose the frequency of the cosine in a way that the HR is the equivalent of exactly one period.Therefore the scaling function S can be represented by the following equation: where the second line shows our disregard for the sensor's performance outside the human sensing range.Finally, we get our improved measure of Effective Human Range Coverage (EHRC) by scaling weighting the areas covered by the sensor with S X ( ).EHRC is therefore specified by the following equation: This value is also taking value in the range [0, 1] thanks to the scaling factor in the denominator.The construction of S x ( ) according to Equation (4) makes sure that the scaling factor in the denominator is finite, and the excessive range of the sensor is not considered when computing the numerator.
Let us analyze how this measure works in three different scenarios shown in Figure 4. Let us assume a taste parameter, for example, sourness.For each of such parameters, there will be a range detectable by humans.It means that lowering this parameter below the lower boundary would be perceived by humans as the same stimuli.Rising the parameter above this value would also be perceived as the same as the upper boundary.In Figure 4a we see a sensor that covers the whole HR as well as large areas on both sides of the HR.In this case, S x ( ) cuts all the ranges that are not relevant to human taste, resulting in EHRC equal 1. Moving to Figure 4b, in this case, we again have a sensor that covers a range much wider than SOCHACKI ET AL.
| 1607 human taste but is moved towards lower values so it does not cover the whole HR.As it still covers the middle of the HR, EHRC is still high, perhaps 0.6 thanks to S x ( ) scaling.Finally, looking at Figure 4c we see a sensor with a range much smaller than the previous one, and even smaller than the HR.Yet, due to its range covering the most important values for cooking, S x ( ) will make it score higher than the previous sensor, perhaps around 0.8.This approach may now measure the usefulness of the sensor to assess foods in ranges relevant to human enjoyment.Unfortunately, this approach is just a starting point that brings a range of new challenges.For example, it is not clear what unit should we choose for each tasting parameter.One way is to do so linearly with the concentration of the sensed particle.Going with an example of saltiness, where the main sensed particle is Na+ ion.The relationship between the concentration of the ion is not linearly translated to human perception.The solution to this problem is not obvious.
Perhaps using a logarithmic scale as pH does for the H+ concentration is a way forward.Or perhaps our S x ( ) should be expanded by an additional term to take care of this relationship.
Another issue is dealing with the nonselectivity of the sensor and human taste.Many substances can cause the same experience for a human.Continuing the saltiness example-K+ ions also activate the saltiness receptors and there is no way for a human to tell them apart.
Similarly, we have seen in this article a conductance sensor used as a saltiness sensor.While it worked in the specific scenario, it could be tricked with substances that are conductive, but not perceived by humans as salty, for example, H+ ions.No literature is available on possible approaches to how much measurement of taste should focus on challenging the low selectivity of a sensor.
Finally, the required form of a sample is a huge question that we have not included in the discussion yet.When assessing a sensor, should we require it to work on solid samples, or allow it a mixed wet sample?What if it requires additional chemicals into the sample to make its measurements properly?While these allowances start to sound like building a laboratory rather than a sensor, we must remember that human receptors are provided with these luxuries in the form of mastication and saliva.Choosing each of these answers creates a separate test scenario where different sensors will rise to the top.Therefore, we will try to conceive a small bunch of scenarios that will be useful for benchmarking a robotic chef as a whole in the next section.

| Benchmarking environments
We have investigated the complexity of measuring the robot's tasting abilities in the previous section, and how likely the test setup is to change the outcome.When we want to do benchmarking to overall cooking, the complexity and number of possible tasks grow to unfeasible numbers.Therefore, we want to take some lessons from already existing benchmarking environments for complicated, vaguely defined tasks.For example, FluidLab (Xian et al., 2023) is an attempt to build a benchmarking environment for robotic fluid manipulation, that is definitely a subset of robotic cooking.This environment limits the number of tasks to 7 (some with two difficulty settings).It includes some cooking tasks like latte art and ice cream pouring.Unfortunately, this is a virtual environment only, which is not feasible for benchmarking cooking whole dishes.
The author believes that in the end a physical task is needed to do proper benchmarking.This is due to the lack of simulation environments that can handle the simulation of cooking that would include both simulations of both liquids and rigid bodies, chemical changes, and heat transfers, not even talking about rendering the simulation in a realistic way for the robot's sensors.While each of these is possible on its own, simulation covering all of these is impossible.
We also predict that having a single environment to benchmark is not a great idea since it bottlenecks possible constructions to make them fit that environment and emphasizes specific skills.This falls too easily into the trap of becoming a benchmarking environment for a very specific set of skills/capabilities rather than for actual cooking which is an efficient preparation of desired food.On the other hand, a major project like building a robotic chef also requires a division of labor and bringing talent from other fields is also important.
Therefore, to maximize the probability of the field making progress, we would like to propose the following set of benchmarking environments.
First, for a benchmarking of general cooking, where a basic kitchen is provided with the freedom to choose robot-specific tools cook, the speed it can so with, and its ability to make the dishes fit the diner's individual taste preference.This is to simulate a company building an actual commercial robotic chef, where they have the possibility of adjusting the environment to their liking, and focus on the end effect.
Second, a benchmark for cooking-related manipulation is needed.This is also the first of the task-specific or field-specific benchmarks we discuss.In this case, the use of tools should be limited to a standard set of off-the-shelf, designed for human tools.
Tasks here should include handling as many tools as possible on a basic level of proficiency, excelling at specific tools (e.g., using the knife for harder and harder tasks like cutting tougher materials, cutting fruits with large seeds or filleting meat) and performing these tasks harder.
Further benchmarks should be made for taste sensing and the ability to produce a dish of the desired taste and quality for a wide variation in the environment and ingredients as possible.All manipulation can be made as simple as possible in this case, ingredients positions can be preprogrammed and tools designed specifically to the task.This is to allow focus on the application of sensors and machine learning that can adjust the cooking process accordingly.The tasks in this benchmark should include copying a dish presented to the robot and achieving the desired taste with varying ingredients, for example, different ripeness, and different kinds of tomato.
Finally, the ability to identify and pick up different tools and ingredients is another benchmark.This benchmark is about computer vision, eye-hand coordination, and versatile grippers.The task here is to identify a desired tool or ingredient using computer vision and retrieve it in environments of increasing difficulty.The possible tasks can include picking items from the fridge or shopping bag or rinsing them under water.More challenges can be introduced by simulating failure recovery like retrieving an object that fell by mistake into a bin or under a table.Graphics that show these benchmarking approaches and their role are shown in Figure 5.
There is a very limited amount of examples of benchmarking used to compare multiple robotic systems on the same task.Perhaps the best example of it is a PUB.R robotic cooking competition at ICRA2023 conference.The provided environment was focused on the last discussed category-identification and grasping of tools and ingredients in an unstructured environment.The scores were assigned based on the amount of successful pick-and-place actions in unstructured environments.This is an example of one of the fieldspecific benchmarks as from the experience of participating, small changes in the test structure would make certain parts of the task much easier or harder.Therefore, the environment has a huge effect in channeling the solutions into one specific solution.This problem is still unsolved, and this paper unfortunately will not propose a better answer than the proposed separation of benchmarks, but the author recognizes the need to do so.

| Practical application
While robotic chefs gain more traction as a research topic, opinions on the practical feasibility of robotic chefs in the industry are mixed F I G U R E 5 Figure showing the possible categories of benchmarks that cover both general performances of an integrated system as well as benchmarks focused on its specific subsystems.(Spence, 2023).News outlets also bring both very positive (12 million jobs will be lost to automation by 2040, 2022) and very negative (McDonalds CEO: Robots won't take over our kitchens, 2022) news.This section first analyzes the advantages of robotic chefs over human chefs and theorizes possible applications that make the most of these advantages.

| Comparative advantages of robotic chef
It is obvious that robotic chefs cannot compete with a human in many categories yet.On the other side, they possess a lot of qualities that can make a preferred choice from the point of business.24/7 Operation-Most of the economic activity is focused during day hours.Working during night hours is considered a burden on employees and is under pressure from the social contract.Many countries put legal limits on how much an employee can work outside standard business hours.For example, the UK limits how many night shifts a worker can take (UK Night Working Law, 2023).This limitation is not a problem for a robot that can work at any time.Even if human supervision would be needed, one person can potentially supervise multiple robots.
Zero marginal cost of work-Robots cost very little to operate for an additional hour.This is in contrast to a human worker that is usually compensated for each hour worked.This favors robots in two types of scenarios.The first scenario is to run a restaurant during low demand profitably, therefore allowing its operation in the mornings, at night and during public holidays.
Data collection and personalized accounts-While personalized experience is usually associated with a human cook in reality very little customization is available in restaurants.This is due to the need of streamlining the business and the high cognitive load of cooking additional variants of a dish.This is where a robot could be competitive due to its ability to remember preferences and previous choices for each of the customers as well as deal with long task queues that contain many dish variants.
Space and sanitary requirements-Employing a human chef requires having facilities like changing rooms, toilets and rest areas.
In the case of the robot, these requirements are minimal.Moreover, the robot's shape is flexible and potentially requires less space than a human-operated kitchen, for example, by stacking two levels of a robotic kitchen on a single floor.

| Business models
This section lists possible business models for robotic chefs.These are thought in a way to maximize the use of robots in areas where they have an advantage.
Elderly care-Elderly care is a branch of the economy where robotically cooked dishes may have a higher value than ones cooked by a human.This is because relying on another human being for basic needs like food is frequently psychologically stressful, especially when providers of this service are not family members.In this case, the robotic chef would be a tool in the hands of an elderly person to remain independent.Therefore, the work of such a robotic cook would be valued much higher than its physical output.
Dark kitchens-Dark kitchens usually use venues that are harder to commute to and are not built with the purpose of running a restaurant there.This is a place to implement robots as little infrastructure needed by human workers is available, the commute is harder than the average workplace and the space can be arranged in the most efficient way for robots when starting from an empty building.
Restaurants for frequent users-Storing data is very cheap and machine learning can extract useful information from the data at almost zero cost.Therefore, robotic chefs could find a use in frequent user buffets or restaurants.This would work best in workplaces and schools where physical proximity promotes repeat use of one venue.
In this case, the robotic chef could gather information about each of the customers and offer a version of a dish according to their preference as well as make appropriate recommendations for them.

| DISCUSSION AND CONCLUSION
The discussion presented in this publication can be concluded in a few paragraphs.First, it becomes obvious that a huge amount of research on human perception of taste was done and is available to be translated into code and applied in a robotic chef design.Also, public opinion shows mixed feelings for robotic chefs, which should be taken into consideration when designing one.
The construction of grippers is also a well-researched topic, even when limited to food handling.The main approaches include rigid grippers with force control, soft grippers, suction cups, and pneumatic grippers.Each of them seems to excel with different objects and tasks therefore tool changer seems to be a good approach.Advanced, human-like hands are still too complicated to be a part of a larger system unless a sizable group of engineers would be constructing it.
The manipulation in the robotic chef's system is usually done with either 6-or 7-DOF robotic arms or all food manipulation is done in a structured environment where all manipulation is limited to 1-or 2-DOF.In the first case, the manipulation is either preprogrammed, done with visual servoing or machine learning is used to find the best trajectories.The dominant sensor is still by far a camera, but tactile, force and taste sensors start to make their way into such systems.
New technologies are slowly implemented into robotic chefs.For example, electronic taste, while before used only for classification tasks, now is used as feedback.Similarly, computer vision reached a level of maturity where is it feasible to retrain or fine-tune a model to integrate it into a robotic chef.Computer vision slowly proliferates to more complicated tasks such as being feedback for cooking, manipulation, robot's environmental awareness, and quality control of cooked meals.
We also forecast that machine learning will allow recipe understanding, modeling of the cooking process, and perception of taste.
Still, many challenges need to be solved for robotic chefs to be useful, comparable, and eventually commercially viable.First we do not have an established way to benchmark the sensing ability of a robotic chef.Especially taste is an elusive skill that was never benchmarked before for machines.The flavor of dishes is usually assessed by people, therefore we propose a benchmark that is bound to human taste.
Further, the performance of a robotic chef needs a benchmark that can be used to compare the system.We propose a few tasks or benchmarking environments that collectively cover most of the skills required by a robotic chef by covering areas, like, sensing, perception, robustness, manipulation, and finally, the ability to cook good food.
Finally, we identify the areas where robots can outperform humans.We list the robot's advantages like the ability to work all shifts for no extra cost, the ability to collect and process data, and the lower social and sanitary needs of the robot.We identify elderly care, workplace/campus buffets, and dark kitchens as the best use case for robotic chefs.It is still an open discussion if robots can outperform humans in raw cooking skills.While sensing, manipulation, and artificial intelligence are not yet on a human level, arguments can be made that robots could outperform human chefs.In particular, the ability to use big data as well as extensive user profiles could enable the robot to personalize dishes better than humans.On top of that, robots can be augmented with additional sensors, giving them access to information humans cannot get, again opening the possibility for a robot to outperform humans.
Figure 2a, c, d.Collage on currently existing robotic chefs is shown in Figure 2.
Figure showing types of sensors that can be used for recreating taste, as well as their advantages and disadvantages from the point of view of robotics.

F
I G U R E 4 Figure showing different variants of sensor coverage of values of an arbitrary taste parameter with respect to its coverage by human receptors (a-c).The ratio of the crossed and total area under S x ( ) corresponds to the proposed Effective Human Range Coverage measure of sensor suitability for use in a robotic chef.and ingredients packaging.The score in such a task should be assigned based on factors like the variety of dishes the robot can T A B L E 2 List data sets containing food with information on feasible training tasks (due to labeling), type of foods, and size of the data sets.Multimedia recipe data set with Ingredient Annotation at every Instructional Step; PFID, Pittsburgh fast-food image data set.