Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (2024)

Oscar Yu1 and Yu She2,∗*Corresponding Author1Oscar Yu is with the School of Electrical & Computer Engineering, Purdue UniversityWest Lafayette, IN 47907, USAoyu@purdue.edu2Yu She is with the School of Industrial Engineering, Purdue UniversityWest Lafayette, IN 47907, USAshey@purdue.edu.

Abstract

Teletaction, the transmission of tactile feedback or touch, is a crucial aspect in the field of teleoperation. High-quality teletaction feedback allows users to remotely manipulate objects and increase the quality of the human-machine interface between the operator and the robot, making complex manipulation tasks possible. Advances in the field of teletaction for teleoperation however, have yet to make full use of the high-resolution 3D data provided by modern vision-based tactile sensors. Existing solutions for teletaction lack in one or more areas of form or function, such as fidelity or hardware footprint. In this paper, we showcase our design for a low-cost teletaction device that can utilize real-time high-resolution tactile information from vision-based tactile sensors, through both physical 3D surface reconstruction and shear displacement. We present our device, the Feelit, which uses a combination of a pin-based shape display and compliant mechanisms to accomplish this task. The pin-based shape display utilizes an array of 24 servomotors with miniature Bowden cables, giving the device a resolution of 6x4 pins in a 15x10 mm display footprint. Each pin can actuate up to 3 mm in 200 ms, while providing 80 N of force and 1.5 um of depth resolution. Shear displacement and rotation is achieved using a compliant mechanism design, allowing a minimum of 1 mm displacement laterally and 10 degrees of rotation. This real-time 3D tactile reconstruction is achieved with the use of a vision-based tactile sensor, the GelSight [1], along with an algorithm that samples the depth data and marker tracking to generate actuator commands. Through a series of experiments including shape recognition and relative weight identification, we show that our device has the potential to expand teletaction capabilities in the teleoperation space.

I INTRODUCTION

Robotic Teleoperation is defined as the process in which a human operator, through the use of a human-machine interface, remotely operates a robotic mechanism, allowing them to manipulate and sense objects through feedback of the human-machine interface. Robotic teleoperation technologies have found many uses outside of the research space [2]. Medical professionals utilize surgical teleoperation robots to perform surgeries remotely and access hard-to-reach areas of the body, minimizing invasiveness. In austere and hazardous environments such as the reactor cores of nuclear plants or the vacuum of space, teleoperation is a vital tool that keeps human operators safe when carrying out critical tasks.

Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (1)

In the field of robotic teleoperation, there is the concept of ‘telepresence’ in which the human operator sufficiently feels as if they themselves are present where the robot is operating; in other words, the quality of the human-machine interface and the feedback presented are enough to recreate the operational environment to a certain degree [2]. Forms of feedback include but are not limited to simple force-feedback transducers from the end effector, Augmented Reality integration, and more. Specifically, our research is focused in ‘teletaction’, a facet of telepresence, that encompasses the transmission of cutaneous information of a surface such as shape or texture to an operator’s skin [3].Research into human sensory perception show that the utility of high-quality tactile feedback holds promise, especially in regard to dexterous manipulation and tasks requiring sensing of minute features [4].In tasks requiring rich tactile feedback, not only is depth information useful, but shear forces and displacements as well. Humans are able to sense tangential shear forces using a variety of receptors in the skin. Psychophysics research has shown that pattern discrimination and sensitivity to stimuli increase when stimulated at different times, increasing spatial resolution [5].

The inception of vision-based tactile sensors represent a jump in tactile sensing capabilities for robotics. These sensors triumph over traditional methods such as resistive, capacitive, or piezoelectric sensors; as they provide much higher spatial resolution, and can supply additional information such as shear forces and contact geometry [6]. These sensors such as the GelSight [1][7] have been used to perform dexterous manipulation tasks such as cable following [8].

Development and application of teletaction systems have also yielded improved human performance and sensing when used with telepresence systems. However, most developments in the teletaction space have their share of drawbacks as well. Some compact wearable systems, such as the Haptic Thimble [9] and others [10][11] are not able to transmit high-fidelity surface information, relying on other qualities like force feedback or vibration for teletaction. Recently, Carnegie Mellon’s Future Interface Systems group has developed the Fluid Reality haptic fingertip[12], which currently represents the smallest form factor of wearable haptic devices that are able to transmit depth and contact information. Even still, this solution lacks in tactile resolution, utilizing only two levels of depth actuation per pin. Other solutions are able to emulate other facets of telepresence, such as shear forces or stretchable surfaces, but have a more niche application space or a bulky footprint that would be difficult to integrate with telepresence systems [10][13][14].

Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (2)

The current most popular method of recreating surface geometry to a high degree of resolution is the pin-based shape display. This method utilizes multiple actuators to actuate a grid of pins up or down, in which the height of the pins correspond to a particular surface geometry. Past research into pin-based shape displays include MIT’s inFORCE table and TRANS-DOCK system [15][16] and FEELEX [17]. Designs utilizing lower cost commercially available components have been demonstrated as well [18].

Pin based shape displays also have their own drawbacks. Most designs typically exhibit a large footprint, inhibiting their use with telepresence systems. Those utilizing pneumatic systems [19] or electromagnetic braking [20] that can combine both wearability and fidelity have reduced refresh rates and low actuator force, limiting their application.

Our research focuses on the development of a miniaturized, low-cost pin based shape display device with compliant mechanisms to leverage the full capabilities offered by vision-based tactile sensors when used in robotic telepresence applications. Our contribution to the teleoperation space is our device, which we call Feelit, can perform both physical 3D depth reconstruction and physical shear displacement reconstruction in real-time, and demonstrate it’s advantages in teleoperation by performing psychophysics and teleoperation experiments, including shape recognition and relative weight identification.

II Design Overview

Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (3)

Fig. 2 shows a block diagram overview of the Feelit tactile reconstruction device, which is mounted on the leader arm of the Aloha teleoperation system [21]. The Aloha teleoperation system consists of two ViperX 6-DoF robot arms with bimanual parallel-jaw grippers attached to the end effectors. The leader arm is controller by the user, which the follower arm mimics. First, the Aloha follower arm grasps an objects with the GelSight tactile sensor attached to the end effector. The GelSight captures live video of the deformed gel, and transmits the video to the computer over USB. On the computer, the video stream is processed to reconstruct the 3D depth and shear displacement information. An algorithm then determines the actuator commands in terms of Pulse Width Modulation (PWM) duty cycles needed to reconstruct the tactile information. The user can specify options to control the scale of reconstruction, as well as the size and location for sampling area. The individual servo actuation data is communicated over serial USB to the Mini Maestro Servo Controllers, which control of individual servos in the tactile device. On the pin display, the servos are then actuated, applying force onto the Bowden cables attached to the display face. This deforms the silicon gel on which the cable ends are cured to (not shown in the figure). The compliant mechanism, which controls the position of the display face, also has servos for actuation. The result is a real-time reconstruction of the 3D surface and shear displacements sensed by the GelSight onto the pin display face. The form factor for our device is made specifically to interface with the Aloha Teleoperation system, with the compliant mechanism and display face attached to the manipulator controls, and the cables leading to servo housing placed under the grip. The material costs of our device total less than $500, making it economically competitive with other teletaction solutions. A physical layout of the device is shown in Fig. 3.

II-A Actuator Design & Bowden Cables

Different actuation methods were considered, including standard digital servos, pneumatic actuators [19], electrostatic braking pins [22], and linear actuators. For our design we opted to use standard commercial-off-the-shelf servo motors, which are EMAX ES08MA II 12g Metal Gear servos, along with the Pololu Mini-Maestro 24 Channel USB servo controller, which serves as an interface between the software and the servos. A rack-and-pinion design was employed for the actuation mechanism. The range of motion for the actuation mechanism is 3 mm, with 1.5 um of resolution. The output torque was calculated at 78 N and actuation speed was measured at 15 mm/s.

We use Bowden cables, specifically the Gold-n-Rod Pushrod System, 36” Cable .032”, to transfer force from the actuators to the display face. These cables are small in diameter and are flexible, allowing for the display face to move for shear displacement reconstruction and for the actuator bodies to be located away from the display face.

Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (4)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (5)

II-B Compliant Display Face

The display face is a resin-printed box with a grid of 6x4 holes, of which pins formed by the ends of the Bowden cables extend out of. The pins are separated by a pitch of 2.6 mm, giving the display a size of 15 mm by 10 mm, approximately the size of a human finger. A 3mm layer of silicone elastomer, similar to the GelSight face, is cured to the pins to give the user continuous surface to feel.

The display face itself is mounted onto a compliant mechanism which is a modified design based off the HexFlex [23] compliant stage. The compliant stage has 3 degrees of freedom: lateral movement and rotation tangential to the display face. Actuation is achieved by rotating the 3 outer tabs, the combination of which contributes to the displacement of the stage. The three outer tabs are connected to specialized servo horns that allow the tabs to rotate even when displaced. Fig. 4 gives a graphical example of stage movements as a result of tab actuation.

An accurate kinematic model is needed to precisely actuate the display; however analytical modeling of compliant mechanisms are notoriously complex. We opted instead to estimate the kinematics to a reasonable degree of accuracy (<0.1 mm) for our application. We assume that there is a linearly independent relationship between the displacement of the compliant stage, and the actuation of each tab. Each tab actuation contributes to the lateral and rotational displacement of the compliant stage, which can be modeled with an nth degree polynomial. The contributions of each tab are then combined to find the total displacement.

[xyϕ]=[px1(θ1)+px2(θ2)+px3(θ3)py1(θ1)+py2(θ2)+py3(θ3)pϕ1(θ1)+pϕ2(θ2)+pϕ3(θ3)]matrix𝑥𝑦italic-ϕmatrixsubscript𝑝subscript𝑥1subscript𝜃1subscript𝑝subscript𝑥2subscript𝜃2subscript𝑝subscript𝑥3subscript𝜃3subscript𝑝subscript𝑦1subscript𝜃1subscript𝑝subscript𝑦2subscript𝜃2subscript𝑝subscript𝑦3subscript𝜃3subscript𝑝subscriptitalic-ϕ1subscript𝜃1subscript𝑝subscriptitalic-ϕ2subscript𝜃2subscript𝑝subscriptitalic-ϕ3subscript𝜃3\begin{bmatrix}x\\y\\\phi\\\end{bmatrix}=\begin{bmatrix}p_{x_{1}}(\theta_{1})+p_{x_{2}}(\theta_{2})+p_{x_%{3}}(\theta_{3})\\p_{y_{1}}(\theta_{1})+p_{y_{2}}(\theta_{2})+p_{y_{3}}(\theta_{3})\\p_{\phi_{1}}(\theta_{1})+p_{\phi_{2}}(\theta_{2})+p_{\phi_{3}}(\theta_{3})\\\end{bmatrix}[ start_ARG start_ROW start_CELL italic_x end_CELL end_ROW start_ROW start_CELL italic_y end_CELL end_ROW start_ROW start_CELL italic_ϕ end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_p start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_p start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + italic_p start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_p start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL italic_p start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + italic_p start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_p start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ]

where

p(x)=p1xn+p2xn1++pn1x+pnp_{(}x)=p_{1}x^{n}+p_{2}x^{n-1}+...+p_{n-1}x+p_{n}italic_p start_POSTSUBSCRIPT ( end_POSTSUBSCRIPT italic_x ) = italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT + italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT + … + italic_p start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT italic_x + italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT

and

1θ1,2,311subscript𝜃1231-1\leq\theta_{1,2,3}\ \leq 1- 1 ≤ italic_θ start_POSTSUBSCRIPT 1 , 2 , 3 end_POSTSUBSCRIPT ≤ 1

are the normalized servo angles for each actuation tab

This expression is then used to find the Jacobian, which is used in a vanilla iterative inverse kinematics algorithm to find the servo angles for a desired x,y,𝑥𝑦x,y,italic_x , italic_y , and ϕitalic-ϕ\phiitalic_ϕ. From test bench measurements (Fig. 5), the compliant display has a minimum of 1mm of lateral movement in any direction without rotation, and 10 degrees of rotation about the center without any lateral movement, as one affects the range of the other.

II-C Depth Reconstruction

To understand the advantage vision-based tactile sensing give over traditional tactile sensors such as capacitave sensors, A brief explanation of the sensor is needed. Specifically, the sensor used in this research is the GelSight. The sensor consists of a silicone elastomer face coated with a gray or silver specular paint, which forms tactile sensing area. An array of red, green, and blue LEDs illuminate the elastomer from different directions parallel to the face. When the elastomer comes into contact with an object it deforms, causing the lights to illuminate its surface, which is captured by a camera. This image is then used to estimate the depth map using Photometric Stereo technique [24], that creates a height function zi=f(xi,yi)subscript𝑧𝑖𝑓subscript𝑥𝑖subscript𝑦𝑖z_{i}=f(x_{i},y_{i})italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) which maps a pixel location xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to a depth zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. A quick overview of the Photometric Stereo technique is as follows: First, a neural network is used to learn a color-to-gradient mapping of each R, G, B value at (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) to its corresponding deformed surface gradient (Gxi,Gyi)superscriptsubscript𝐺𝑥𝑖superscriptsubscript𝐺𝑦𝑖(G_{x}^{i},G_{y}^{i})( italic_G start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_G start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ). This is done using impressions of a 3D printed sphere of known diameter onto the gel, and assigning surface gradients manually, to use as training data. After the neural network learns the color-to-gradient mapping, a 2-D fast Possion solver is used to spatially integrate the surface gradients to obtain the depth map. The result is a depth map of the elastomer surface deformed by the object, with a resolution of 240 by 320 pixels over a 18 by 24 mm sensing area. Vision based tactile sensors are able to sense much finer details in depth and texture compared to traditional contact sensors, and estimate shear displacement forces using markers on the silicone face [1].

II-D Marker Tracking & Correction

To provide shear displacement information, a grid of markers are needed on the gel face of the GelSight sensor. Accurate marker tracking is needed for shear displacement calculation, and to provide masking information for the depth estimation, as the different color of the markers interfere with the depth calculation algorithm. Tracking is performed with the mean-shift algorithm provided in the codebase for the GelSight sensor. Mean shift was chosen as it is less prone to drift and edge case errors compared to optical flow tracking, and is comparable in computational speed with mean-shift operating at an average of 16.76 frames per second (FPS) and optical flow with 15.95 FPS over 500 frames. The mean-shift algorithm iteratively moves each tracked marker to the nearest high-density region in an altered grayscale image. True marker locations in the input image are represented as high-density regions, and ideally the last known tracked marker locations are locally near their corresponding true marker, allowing the algorithm to converge.

The mean-shift algorithm is still prone to errors during transient moments of high displacement, with markers either losing track or converging onto each other. To combat this, a marker correction algorithm is implemented to identify & correct untrustworthy markers. The psuedocode for the marker error correction algorithm is given in Algorithm 1

imgGelSight_Live()𝑖𝑚𝑔𝐺𝑒𝑙𝑆𝑖𝑔𝑡_𝐿𝑖𝑣𝑒img\leftarrow GelSight\_Live()italic_i italic_m italic_g ← italic_G italic_e italic_l italic_S italic_i italic_g italic_h italic_t _ italic_L italic_i italic_v italic_e ( )

bin_imgcv2.adaptiveThreshold(cv2.cvtColor(img)))bin\_img\leftarrow cv2.adaptiveThreshold(cv2.cvtColor(img)))italic_b italic_i italic_n _ italic_i italic_m italic_g ← italic_c italic_v 2 . italic_a italic_d italic_a italic_p italic_t italic_i italic_v italic_e italic_T italic_h italic_r italic_e italic_s italic_h italic_o italic_l italic_d ( italic_c italic_v 2 . italic_c italic_v italic_t italic_C italic_o italic_l italic_o italic_r ( italic_i italic_m italic_g ) ) )

marker_listmean_shift_alg(img)𝑚𝑎𝑟𝑘𝑒𝑟_𝑙𝑖𝑠𝑡𝑚𝑒𝑎𝑛_𝑠𝑖𝑓𝑡_𝑎𝑙𝑔𝑖𝑚𝑔marker\_list\leftarrow mean\_shift\_alg(img)italic_m italic_a italic_r italic_k italic_e italic_r _ italic_l italic_i italic_s italic_t ← italic_m italic_e italic_a italic_n _ italic_s italic_h italic_i italic_f italic_t _ italic_a italic_l italic_g ( italic_i italic_m italic_g )

trust_idxnp.ones()formulae-sequence𝑡𝑟𝑢𝑠𝑡_𝑖𝑑𝑥𝑛𝑝𝑜𝑛𝑒𝑠trust\_idx\leftarrow np.ones()italic_t italic_r italic_u italic_s italic_t _ italic_i italic_d italic_x ← italic_n italic_p . italic_o italic_n italic_e italic_s ( )

forilength(marker_list)𝑖𝑙𝑒𝑛𝑔𝑡𝑚𝑎𝑟𝑘𝑒𝑟_𝑙𝑖𝑠𝑡i\in length(marker\_list)italic_i ∈ italic_l italic_e italic_n italic_g italic_t italic_h ( italic_m italic_a italic_r italic_k italic_e italic_r _ italic_l italic_i italic_s italic_t )do

curr_markermarker_list(i)𝑐𝑢𝑟𝑟_𝑚𝑎𝑟𝑘𝑒𝑟𝑚𝑎𝑟𝑘𝑒𝑟_𝑙𝑖𝑠𝑡𝑖curr\_marker\leftarrow marker\_list(i)italic_c italic_u italic_r italic_r _ italic_m italic_a italic_r italic_k italic_e italic_r ← italic_m italic_a italic_r italic_k italic_e italic_r _ italic_l italic_i italic_s italic_t ( italic_i )

ifcurr_marker𝑐𝑢𝑟𝑟_𝑚𝑎𝑟𝑘𝑒𝑟curr\_markeritalic_c italic_u italic_r italic_r _ italic_m italic_a italic_r italic_k italic_e italic_r location𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛locationitalic_l italic_o italic_c italic_a italic_t italic_i italic_o italic_n in bin_img𝑏𝑖𝑛_𝑖𝑚𝑔bin\_imgitalic_b italic_i italic_n _ italic_i italic_m italic_g is 1111

or curr_marker.vec>30\mid curr\_marker.vec\mid>30∣ italic_c italic_u italic_r italic_r _ italic_m italic_a italic_r italic_k italic_e italic_r . italic_v italic_e italic_c ∣ > 30then

trust_idx(i)0𝑡𝑟𝑢𝑠𝑡_𝑖𝑑𝑥𝑖0trust\_idx(i)\leftarrow 0italic_t italic_r italic_u italic_s italic_t _ italic_i italic_d italic_x ( italic_i ) ← 0

endif

foradj_markersurrounding_markers𝑎𝑑𝑗_𝑚𝑎𝑟𝑘𝑒𝑟𝑠𝑢𝑟𝑟𝑜𝑢𝑛𝑑𝑖𝑛𝑔_𝑚𝑎𝑟𝑘𝑒𝑟𝑠adj\_marker\in surrounding\_markersitalic_a italic_d italic_j _ italic_m italic_a italic_r italic_k italic_e italic_r ∈ italic_s italic_u italic_r italic_r italic_o italic_u italic_n italic_d italic_i italic_n italic_g _ italic_m italic_a italic_r italic_k italic_e italic_r italic_sdo

dcurr_marker.vecadj_marker.vecd\leftarrow\mid curr\_marker.vec-adj\_marker.vec\miditalic_d ← ∣ italic_c italic_u italic_r italic_r _ italic_m italic_a italic_r italic_k italic_e italic_r . italic_v italic_e italic_c - italic_a italic_d italic_j _ italic_m italic_a italic_r italic_k italic_e italic_r . italic_v italic_e italic_c ∣

ifd>max_diff𝑑𝑚𝑎𝑥_𝑑𝑖𝑓𝑓d>max\_diffitalic_d > italic_m italic_a italic_x _ italic_d italic_i italic_f italic_fthen

trust_idx(i)0𝑡𝑟𝑢𝑠𝑡_𝑖𝑑𝑥𝑖0trust\_idx(i)\leftarrow 0italic_t italic_r italic_u italic_s italic_t _ italic_i italic_d italic_x ( italic_i ) ← 0

break

endif

endfor

endfor

forjtrust_idx=0𝑗𝑡𝑟𝑢𝑠𝑡_𝑖𝑑𝑥0j\in trust\_idx=0italic_j ∈ italic_t italic_r italic_u italic_s italic_t _ italic_i italic_d italic_x = 0do

marker_list(j)interpolate(marker_list(trust_idx))𝑚𝑎𝑟𝑘𝑒𝑟_𝑙𝑖𝑠𝑡𝑗𝑖𝑛𝑡𝑒𝑟𝑝𝑜𝑙𝑎𝑡𝑒𝑚𝑎𝑟𝑘𝑒𝑟_𝑙𝑖𝑠𝑡𝑡𝑟𝑢𝑠𝑡_𝑖𝑑𝑥marker\_list(j)\leftarrow interpolate(marker\_list(trust\_idx))italic_m italic_a italic_r italic_k italic_e italic_r _ italic_l italic_i italic_s italic_t ( italic_j ) ← italic_i italic_n italic_t italic_e italic_r italic_p italic_o italic_l italic_a italic_t italic_e ( italic_m italic_a italic_r italic_k italic_e italic_r _ italic_l italic_i italic_s italic_t ( italic_t italic_r italic_u italic_s italic_t _ italic_i italic_d italic_x ) )

endfor

We first parse through each estimated marker and determine trustworthiness with several critera. First, we check is if the current evaluated marker is located on a dark spot in the image, which generally corresponds to marker locations. This is done by transforming the image into grayscale, then to a binary image via dynamic thresholding. Dark pixels in the image are represented as zeros, and generally correspond to marker shapes. We cannot utilize this binary image as a mask directly as large displacements are also present in the binary image. Next, we check if the vector displacement of the current marker generally corresponds with the displacement of surrounding markers. Local markers in the gel should have roughly similar displacements, and any outliers are likely to be markers which have not converged properly. The displacement of four markers surrounding the current marker (up,down,left,right) are compared in displacement. If there is too much discrepancy between the current marker and an adjacent marker, it is marked as untrustworthy. We determined through testing that marker displacements should not vary more than 15 pixels from each adjacent marker during nominal operation. Finally, an upper limit to the marker displacement is placed to catch any markers that have converged onto different centers.After all the markers have been categorized, from the remaining set of trusted markers we interpolate the marker displacements for untrustworthy markers, using linear interpolation. As long the interpolation corrects markers to be near their corresponding markers, the mean-shift algorithm will converge accurately.

Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (8)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (9)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (10)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (11)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (12)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (13)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (14)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (15)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (16)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (17)

II-E Sampling & Reconstruction

After marker tracking and correction has been applied, a mask created and used to correct the depth map. The tracked markers are then used to calculate the total displacement and rotation of the sampling grid. This sampling grid is used as coordinates to sample the depth map, which correspond to the actuators of each pin on the pin display. The user has the option of specifying the location and size of the grid to be sampled, as well as the gain for sampling. This allows for scaling the sensitivity of the haptic response, which is useful for sensing minute features or grasping small objects. The sampled depths are then translated to serial commands, and send via serial USB to the Pololu Maestro servo controller, as mentioned earlier. The software loop runs at a frequency of 8 Hz during nominal operation.

II-F Device Demonstration

We demonstrate the operation of our-pin based shape display with the GelSight using several different objects, as seen in Fig. 6. The shapes we tested were a sphere (a), cube (b), cylinder (c), and two dots (d), using 3D printed shapes. The color images are the raw video output from the GelSight sensor, with the black dots being markers for tracking. Overlaid on the image is a white square representing the sampling area, vectors representing the displacement of the markers from the center position, and values for the total X,Y, and rotational displacement calculated. The depth map of the elastomer surface deformed by the object is shown as a 3D point cloud. The 6 by 4 reconstructed depth is also shown as a grayscale image, with longer pin lengths on the pin display represented as whiter colors. Finally, the reconstructed depth and shear is shown on the pin display without the gel face. Due to their relatively large size, marker artifacts are still present in the depth image, even with the mask applied.

We also demonstrate the ability to resolve different levels of detail from the depth image by scaling the sampling area. In Fig. 7, we show 3 different levels of magnification at different locations on the sensing area for an object, by adjusting the spacing in the sampling grid. At high pixel spacing for the sampling grid, general features of the object are apparent, and as the pixel spacing decreases, finer details start apperating, such as the gap between the two concentric circles.

III Psychophysics and Teleoperation Experiments

We now investigate the performance and utility of Feelit through different haptic experiments with human participants (IRB-2024-433). First, we perform psychophysical experiments without the teleoperation aspect to measure how well humans are able to interpret the tactile information provided by our device. These experiments consists of recognition of simple shapes, shear displacement stimuli, and depth discrimination. Second, we demonstrate the utility of rich tactile feedback with a simple teleoperation task.For the experiments set, we recruited 7 participants (mean age 24.7, SD=3.4, 2 identifying as female and 5 as male). At the end of each task participants are debriefed on their results and asked about their thoughts on the device. For all experiments, the sampling area and gain were scaled 1-1 with the reconstruction on Feelit.

III-A Psychophysics Experiment: Simple Shape Presentation

The goal of this task is to evaluate how well participants are able to discern simple tactile stimuli that are sensed by the GelSight and reconstructed on our device. Seven simple shapes are impressed on the GelSight by the experimenter and reconstructed on the device as seen in Fig. [8]. These shapes are a horizontal bar, a vertical bar, two horizontal bars, a diagonal bar, two dots, three dots, and four dots. Shape information is electronically recorded and played back for consistency. In the tutorial phase, participants are presented with each shape at least once, and are allowed to repeat any shape they wish for any duration. Once the tutorial phase ended, participants are blindfolded and are randomly presented with a shape one at a time for 3 seconds. Participants could request to be re-presented with the current shape as many times as they wished. Once participants gave a prediction, their responses were recorded and were presented with a new shape. Participants were not told whether they had determined the correct shape or not. This process is repeated for each shape 3 total times, for a total of 21 presentations per participants.

Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (18)

III-B Psychophysics Experiment: Simple Displacement Recognition

The goal of this task is to evaluate how well participants are able to discern displacement information that are sensed by the GelSight and reconstructed on our compliant mechanism. 6 distinct displacement stimuli are defined for this experiment, and can be seen in Fig. [9], which are simulated directly from the software. These stimuli are the display face moving up (distal) from the participant, down (proximal), left, right, and rotating clockwise and counterclockwise. The experiment is conducted in the same manner as in the first experiment. Again, the stimuli are presented 3 times each randomly for a total of 18 presentations per participant.

III-C Psychophysics Experiment: Object Depth Discrimination.

The goal of this task is to evaluate how well participants are able to differentiate between different depth displacements of an object reconstructed on the display. A sphere is impressed onto the GelSight at a depth of 0.5, 1.0, 1.5, and 2.0 mm, and can be seen in Fig. [10], with the response recorded in software for repeatability. The experiment is conducted in the same manner as in the first experiment, with participants receiving a tutorial phase to familiarize themselves. For the experiment phase, the participants will be shown two of the stimuli randomly, with the display being reset for 1 second in between, and have to determine which has a larger depth or if they are the same. Each combination of stimuli are shown twice randomly throughout the experiment, for a total of 20 presentations.

Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (19)

III-D Teleoperation Demonstration: Relative Weight Identification Task

The goal of this task is to demonstrate the value of the tactile feedback provided by Feelit in determining object properties. Participants will be operating the Aloha Teleoperation system with the Feelit device attached. Participants will be handed three cylinders identical in appearance, of which two have weights of 250g and 500g embedded inside, and are asked to grasp each with the Aloha robot for a short period of time. We implemented this procedure to eliminate any bias from the participant’s individual skill when operating the Aloha device, and any visual or auditory cues that may tip off participants. An initial trial period will be given to familiarize participants with the teleoperation system and task. Once the experiment begins, participants are randomly handed each object one at a time to grasp with the teleoperation system. Participants are instructed to verbally indicate which objects they believe to be the lightest and heaviest, and their level of confidence, with a 0% confidence indicating a complete guess. The task will be repeated for 5 trials, with the order of the objects randomized between them.

III-E Results

For the shape recognition experiment, participants were able to discriminate the simple shapes with an average accuracy of 83%, as seen in Fig. [8]. Although all participants were able to sense the display face, we observed that those with smaller fingers were more confident about their answers as they could move their fingers around the sensing area. This was especially true for the double horizontal bar, which requires the widest sensing area out of all the shapes. Participants commented although they could feel the forward most bar in the double horizontal bar shape, the rear bar was more subtle, which possibly contributed to the confusion between the two. This is supported by the fact that the 2 horizontal bars were misidentified as the single horizontal bar more often than the reverse. Participants were quick to identify distinct shapes such as the diagonal bar or the three dots, but more often than not required a re-presentation for similar shapes such as the 2 dots and vertical bar, or the single and double horizontal bars. Since the gel surface of the pin display provides some form of ‘smoothing’ or ‘interpolation’ between adjacent pins, this may have confused participants in identifying the 2 dots for the vertical bar, but not the other way around.

Participants also performed well on the shear displacement recognition, with an average accuracy of 86.5% as seen in Fig. [9]. Notably, since rotation of the display required more servo actuation, some participants were able to discern lateral movement from rotation based on the magnitude of vibration from the servos, even with hearing protection. They were still able to discern clockwise and counterclockwise rotation using tactile sensing only, which is independent of vibrational cues. Participants were overall more confident in predicting shear displacement than the shapes. Participants seemed to be slightly biased towards the right displacement stimulus, which can be seen in the relatively higher amount of misidentifications. A possible explanation is that all participants were right-handed and used the device as such, which may somehow affect decision making.

Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (20)
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (21)

For the object depth discrimination (Fig. [10]), participants were able to discern the difference between spheres with a 1.5 mm difference 100% of the time. With a 1 mm difference, discrimination accuracy drops to 96.4%, and with only a 0.5 mm difference, 78.6%. For sphere presentations with identical heights, participants were not able tell if the heights were the same or different, except in the 0.5 mm high sphere which showed 92.9% accuracy in identification.

For the relative weight identification task (Fig. [11]), participants were 91.43% accurate in identifying the correct relative weights. In trials with both answers right, participants were 91.33% confident in their answer. Participants only confused the medium and lightest objects 8.57% of the time while still correctly identifying the heaviest object for all trials. For trials where the lightest object was misidentified, participants had a lower confidence in their answer at 76%. This suggests logically that there is a correlation between the confidence of a participant’s answer and how accurate they were, although more data is needed to evaluate this hypothesis, as participants were overwhelmingly able to identify the correct weights. The test was also conducted for all participants with Feelit disabled. In those tests participants were not at all able or confident in determining the correct relative weights. This, and the high confidence of participant’s answers during correct identification, shows that participants were able to utilize the tactile feedback of Feelit in their decision making to determine the object relative weights.

IV Limitations & Future Work

The most apparent limitation of our design is the update frequency of our software loop, due to the heavy computational cost of the depth estimation algorithm, and in part the marker tracking & error correction. Possible solutions to optimize this are to train a smaller neural network for depth estimation, or estimating tactile response based on the difference image only. Improving the marker tracking & error correction algorithm or using less markers on the GelSight pad can also speed up computation.

For the physical hardware, although Feelit was small enough to maneuver the leader arm without much difficulty, the servo mechanism housing on the bottom may impede movements when positioning the end effector near the base or low to the ground. Of course, making the device smaller and lighter is another avenue of improvement. Designs to mount the servo mechanisms on the robot links themselves or on opposing sides to balance the weight were considered, but the current design was chosen to balance portability and ease of maintenance, as it is a working prototype. Potential solutions include utilizing smaller actuators or mechanisms, like piezoelectrics. The limited capabilities of the Aloha Teleoperation system also constrained the type of teleoperation experiments conducted. For example, the relatively low precision of Aloha made it difficult for participants to grasp objects in a repeatable manner, and the low actuation force of the gripper made it hard to grasp heavy objects. These limitations resulted in us changing our experimental procedures to maintain consistency. The device, although tailored for use with the Aloha system, can easily be redesigned to fit other similar teleoperation setups that may offer better capabilities. We plan to continue performing more teleoperation tasks to further evaluate the value of the tactile feedback given by our device.

Increasing the resolution of the pin display is also another avenue of advancement. We are reaching the physical limit for pin density with Bowden Cables, but still have yet to utilize the precision offered by the GelSight sensor. Increasing pin density without sacrificing other capabilities like depth resolution and actuation force remains an open research problem.

V Conclusion

In this paper we present Feelit, our teletaction device that can provide haptic depth reconstruction and shear displacement information to the user. We use this device in conjunction with the GelSight vision-based tactile sensor to relay high-quality tactile feedback to the user in real-time. We accomplish this through miniaturizing a pin-based shape display with Bowden cables and low-cost actuators, and by employing a compliant mechanism to move the display face. Through psychophysics experiments and a relative weight identification task for teleoperation, we show that Feelit is able to effectively convey high quality tactile information to the user in real-time, and has the potential to expand the teleoperation space.

References

  • [1]W.Yuan, S.Dong, and E.H. Adelson, “Gelsight: High-resolution robot tactile sensors for estimating geometry and force,” Sensors, vol.17, no.12, p. 2762, 2017.
  • [2]J.Cui, S.Tosunoglu, R.Roberts, C.Moore, and D.W. Repperger, “A review of teleoperation system control,” in Proceedings of the Florida conference on recent advances in robotics.Citeseer, 2003, pp. 1–12.
  • [3]R.Fearing, G.Moy, and E.Tan, “Some basic issues in teletaction,” in Proceedings of International Conference on Robotics and Automation, vol.4, 1997, pp. 3093–3099 vol.4.
  • [4]S.J. Lederman and R.L. Klatzky, “Sensing and displaying spatially distributed fingertip forces in haptic interfaces for teleoperator and virtual environment systems,” Presence, vol.8, no.1, pp. 86–103, 1999.
  • [5]J.Dargahi and S.Najarian, “Human tactile perception as a standard for artificial tactile sensing—a review,” The International Journal of Medical Robotics and Computer Assisted Surgery, vol.1, no.1, pp. 23–35, 2004. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/rcs.3
  • [6]S.Wang, Y.She, B.Romero, and E.H. Adelson, “Gelsight wedge: Measuring high-resolution 3d contact geometry with a compact robot finger,” CoRR, vol. abs/2106.08851, 2021. [Online]. Available: https://arxiv.org/abs/2106.08851
  • [7]S.Wang, Y.She, B.Romero, and E.Adelson, “Gelsight wedge: Measuring high-resolution 3d contact geometry with a compact robot finger,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).IEEE, 2021, pp. 6468–6475.
  • [8]Y.She, S.Wang, S.Dong, N.Sunil, A.Rodriguez, and E.H. Adelson, “Cable manipulation with a tactile-reactive gripper,” CoRR, vol. abs/1910.02860, 2019. [Online]. Available: http://arxiv.org/abs/1910.02860
  • [9]M.Gabardi, M.Solazzi, D.Leonardis, and A.Frisoli, “A new wearable fingertip haptic interface for the rendering of virtual shapes and surface features,” in 2016 IEEE Haptics Symposium (HAPTICS), 2016, pp. 140–146.
  • [10]S.B. Schorr, Z.F. Quek, W.R. Provancher, and A.M. Okamura, “Tactile skin deformation feedback for conveying environment forces in teleoperation,” in Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction Extended Abstracts, ser. HRI’15 Extended Abstracts.New York, NY, USA: Association for Computing Machinery, 2015, p. 195–196. [Online]. Available: https://doi.org/10.1145/2701973.2702719
  • [11]R.M. Pierce, E.A. Fedalei, and K.J. Kuchenbecker, “A wearable device for controlling a robot gripper with fingertip contact, pressure, vibrotactile, and grip force feedback,” 2014 IEEE Haptics Symposium (HAPTICS), pp. 19–25, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:30774225
  • [12]V.Shen, T.Rae-Grant, J.Mullenbach, C.Harrison, and C.Shultz, “Fluid reality: High-resolution, untethered haptic gloves using electroosmotic pump arrays,” in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’23.New York, NY, USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3586183.3606771
  • [13]A.Steed, E.Ofek, M.Sinclair, and M.Gonzalez-Franco, “A mechatronic shape display based on auxetic materials,” Nature Communications, vol.12, no.1, 2021.
  • [14]P.Zhang, M.Kamezaki, Y.Hattori, and S.Sugano, “A wearable fingertip cutaneous haptic device with continuous omnidirectional motion feedback,” in 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 8869–8875.
  • [15]S.Follmer, D.Leithinger, A.Olwal, A.Hogge, and H.Ishii, “Inform: Dynamic physical affordances and constraints through shape and object actuation,” in Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’13.New York, NY, USA: Association for Computing Machinery, 2013, p. 417–426. [Online]. Available: https://doi.org/10.1145/2501988.2502032
  • [16]K.Nakagaki, Y.R. Liu, C.Nelson-Arzuaga, and H.Ishii, “Trans-dock: Expanding the interactivity of pin-based shape displays by docking mechanical transducers,” in Proceedings of the Fourteenth International Conference on Tangible, Embedded, and Embodied Interaction, ser. TEI ’20.New York, NY, USA: Association for Computing Machinery, 2020, p. 131–142. [Online]. Available: https://doi.org/10.1145/3374920.3374933
  • [17]H.Iwata, H.Yano, F.Nakaizumi, and R.Kawamura, “Project feelex: Adding haptic surface to graphics,” in Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, ser. SIGGRAPH ’01.New York, NY, USA: Association for Computing Machinery, 2001, p. 469–476. [Online]. Available: https://doi.org/10.1145/383259.383314
  • [18]C.Wagner, S.Lederman, and R.Howe, “A tactile shape display using rc servomotors,” in Proceedings 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems. HAPTICS 2002, 2002, pp. 354–355.
  • [19]G.Moy, C.Wagner, and R.Fearing, “A compliant tactile display for teletaction,” in Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), vol.4, 2000, pp. 3409–3415 vol.4.
  • [20]S.Jang, L.H. Kim, K.Tanner, H.Ishii, and S.Follmer, “Haptic edge display for mobile tactile interaction,” in Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, ser. CHI ’16.New York, NY, USA: Association for Computing Machinery, 2016, p. 3706–3716. [Online]. Available: https://doi.org/10.1145/2858036.2858264
  • [21]T.Z. Zhao, V.Kumar, S.Levine, and C.Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” 2023.
  • [22]K.Zhang, E.J. Gonzalez, J.Guo, and S.Follmer, “Design and analysis of high-resolution electrostatic adhesive brakes towards static refreshable 2.5d tactile shape display,” IEEE Transactions on Haptics, vol.12, no.4, pp. 470–482, 2019.
  • [23]M.Culpepper, G.Anderson, and P.Petri, “Hexflex: A planar mechanism for six-axis manipulation and alignment,” 01 2002.
  • [24]R.Woodham, “Photometric method for determining surface orientation from multiple images,” Optical Engineering, vol.19, 01 1992.
Feelit: Combining Compliant Shape Displays with Vision-Based Tactile Sensors for Real-Time Teletaction (2024)
Top Articles
Latest Posts
Article information

Author: Annamae Dooley

Last Updated:

Views: 6089

Rating: 4.4 / 5 (65 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Annamae Dooley

Birthday: 2001-07-26

Address: 9687 Tambra Meadow, Bradleyhaven, TN 53219

Phone: +9316045904039

Job: Future Coordinator

Hobby: Archery, Couponing, Poi, Kite flying, Knitting, Rappelling, Baseball

Introduction: My name is Annamae Dooley, I am a witty, quaint, lovely, clever, rich, sparkling, powerful person who loves writing and wants to share my knowledge and understanding with you.