SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Enterprise Social Networking in Alcatel-Lucent
  5. Engage Data Collection
  6. Engage Data Analytics
  7. Related Work
  8. Conclusions and Future Work
  9. References
  10. Biographical Information

In recent years, enterprise social networking (ESN) has gained a foothold in many companies. While there are numerous similarities between enterprise and public online social networks such as Facebook or Twitter, there are also important differences, many of which are driven by the inherent organizational structure of an enterprise, that make ESNs an important area to study in their own right. This paper describes the ESN applications that have been used within Alcatel-Lucent over the past several years, focusing on the most recent. We discuss the tools and methodologies utilized to gain access to Alcatel-Lucent's ESN usage data and to perform various types of analyses and visualizations of that data. We provide some statistics about ESN usage within the company and give insights into some of the questions that ESN tools can help to answer, such as the degree to which ESN applications break down geographic and/or organizational boundaries. Our analyses show that employees in the middle levels of the company hierarchy tend to use ESN the most, and that there is a considerable amount of inter-country communication occurring. © 2014 Alcatel-Lucent.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Enterprise Social Networking in Alcatel-Lucent
  5. Engage Data Collection
  6. Engage Data Analytics
  7. Related Work
  8. Conclusions and Future Work
  9. References
  10. Biographical Information

Social networking applications such as Facebook* and Twitter* have become ubiquitous and have changed not only the way many people communicate with each other, but how information is disseminated around the world. Within corporations, enterprise social networking (ESN) applications have also become quite popular as communications tools. According to a Harvard Business Review study [11] performed in 2010, 79 percent of the 2100 companies surveyed used or planned to use social media, and 69 percent of the employees within these companies predicted that their use of social media would grow. Over the past five years, Alcatel-Lucent has made three ESN applications available to its employees, two of which are still in use today. In this paper we focus on one of these applications in particular. Alcatel-Lucent is not alone in this space, indeed, companies such as HP and IBM have also done considerable work in this area, which we discuss.

There are several underlying points that we will attempt to make within this paper. First, ESN data analysis is an interesting topic that can lead to interesting discoveries and help to answer certain questions about how people communicate within a large, multinational company. Second, ESN data analysis is not an easy endeavor: obtaining the data, understanding the limitations and nuances of the available application programming interfaces (APIs) and tools, and navigating the hazards of data privacy can all be quite challenging. Lastly, ESN data analysis is an iterative process. In our case, the original set of questions that drove our analyses led, several times, to more questions and more analyses.

Panel 1. Abbreviations, Acronyms, and Terms
APAC—Asia Pacific and China
API—Application programming interface
CAViTEn—Conversation Analysis and Visualization Tool for Engage
CAViTY—Conversation Analysis and Visualization Tool for Yammer
EAC—Engage Activity Crawler
EJECT—Engage and JSON Experimentation Combo Tool
ESN—Enterprise social network
GUI—Graphical user interface
IDF—Inverse document frequency
INFOCOM—International Conference on Computer Communications
JSON—JavaScript Object Notation
OSN—Online social network
P&P—People and Projects enterprise social networking application
SNA—Social network analysis
SONAR—SOcial Networking Architecture
TF—Term frequency
URL—Uniform resource locator
XML—Extensible Markup Language

In this paper, we present a study of ESN in Alcatel-Lucent, along with analyses of data collected from Engage, an instance of the Jive* ESN application. The data analysis work presented here is a result of a synergistic partnership formed in December 2010 between two groups within Bell Labs. The joint work between these two groups produced both a paper [8] that was presented at IEEE International Conference on Computer Communications (INFOCOM) in April 2013 and a related patent submission [9]. It is expected that this enduring partnership will continue to bear fruit.

We begin by describing the three enterprise social networking tools that were used within Alcatel-Lucent over the past several years, focusing on the most recent, and we provide some statistics about their use. Next, we describe the tools that we developed to collect the ESN data and give a high level description of the associated database architecture. We then provide some insights into several examples of questions that ESN applications can help to answer, and we delve into the various analyses performed along with the tools used in the process. Lastly, we describe some related work in both enterprise and public social networks, and end the paper by providing some overall conclusions of our analyses and describing some related future work.

Enterprise Social Networking in Alcatel-Lucent

  1. Top of page
  2. Abstract
  3. Introduction
  4. Enterprise Social Networking in Alcatel-Lucent
  5. Engage Data Collection
  6. Engage Data Analytics
  7. Related Work
  8. Conclusions and Future Work
  9. References
  10. Biographical Information

Like most companies, Alcatel-Lucent endeavors to foster better communication between its various organizations, and being a relatively large multi-national company, to promote better communication between employees in different countries. As one recent article [16] points out, although email is a reasonable medium for certain types of communication, instant messaging, wikis and social media applications are significantly better for collaborative work, and the latter are on the verge of becoming true office productivity tools. One motivation for studying the use of ESN applications within Alcatel-Lucent was to determine whether or not they were truly breaking down organizational and geographic boundaries, which such applications strive to do. Another impetus was to gauge how employees were using the applications. A third objective was to try to use the data as a corpus for other applications that could, for example, recommend people, documents or other objects of interest based on a keyword search. Some of these goals have already been met, some are works in progress, and others are slated for future work.

ESN Applications in Alcatel-Lucent

Since 2008, three enterprise social networking applications have been used within Alcatel-Lucent: People & Projects (P&P), Yammer*, and an instance of Jive known internally as Engage. It is the latter whose data collection and analysis is the focus of this paper, but the other two were important stepping stones toward the Engage data analyses described herein. People & Projects (P&P) [6] was a homegrown, web-based application developed by a small team in Bell Labs. It was specifically designed to facilitate relationship discovery and took a crowdsource-based approach to help build a knowledge base about the company's employees and their associated projects. P&P provided tagging, following, and search capabilities; profile pages with contact information; hierarchical navigation and relationship graphs; a conversation mechanism; activity notifications; and a means to provide feedback. Initially launched in April 2009 for use primarily within Bell Labs, P&P was made available to the entire company a few months later. The launch of Engage in April 2010 as the corporate-sponsored social networking application for Alcatel-Lucent ultimately led to the decommissioning of P&P in July 2011.

A second social networking application used within Alcatel-Lucent is Yammer, an enterprise microblogging tool that was made available to employees in October 2008. Initially, Yammer was limited to microblog-like conversations and following, but features have since been added that provide significantly more collaborative capabilities. Yammer usage grew steadily within Alcatel-Lucent for the first two years, particularly within Bell Labs. Although still available at the time of this writing, Yammer usage has decreased significantly since Engage made its appearance. Data representing approximately two years' worth of Yammer posts were collected and analyzed in 2010 by the same Bell Labs team that developed P&P. This work, along with some artifacts from P&P, paved the way for the Engage analysis work to come.

In April 2010, Alcatel-Lucent launched an instance of Jive known internally as Engage. Aside from its web-based interface, Jive is a highly configurable platform containing an extensive set of APIs that provide access to much of the underlying data. Unlike its predecessors P&P and Yammer, Engage (the name we will use henceforth when referring to Jive) was well publicized within Alcatel-Lucent right from the start, and all employees were encouraged to use it. While P&P supported only people, projects, and one type of text-based conversation, Engage supports people, projects, groups, documents and several different types of conversations: discussions, documents, microblogs and blogposts. (Unless otherwise noted, when we refer to documents in Engage, we typically mean the discussions associated with a document posted to Engage rather than the document itself). Blogposts and microblogs are viewable and can be replied to by any Engage user. When a document is uploaded to Engage, users typically post some information about it, which can later be replied to by other users. Both documents and discussions are typically posted within a particular Engage group or project, and depending on the associated privacy level, other users (as well as the Engage APIs which collect the data) may or may not have access to them.

Four data privacy levels are supported in Engage: open, members-only, private, and secret. Open and members-only data represents the public information viewable by everyone in the company, and therefore is the data used in the analyses presented in this paper. Note that we use the term “publicly available” to mean that the Engage interactions that generated the data are available for viewing by any Alcatel-Lucent employee, but not by people outside of the company. According to Engage administrators, these publicly visible activities account for approximately one-half of the overall content creation activity in Engage.

Engage Usage Within Alcatel-Lucent

Although Engage became available in April 2010, it was not until over a year later that we had the ability to collect and begin analyzing the data. From the 18 month period beginning July 1, 2011 through the end of 2012, over a quarter of a million publicly-available Engage activities had taken place, and 16,000 unique users had posted something to Engage at least once, out of approximately 74,000 total company employees and contractors. Other users may have posted to non-publicly-available parts of Engage, or they may have simply viewed posts rather than posting anything themselves; we do not have data about those users and their activities.

Table I shows high-level statistics about the types of posts users make to Engage. Discussions are used the most, with documents, blogposts and micro-blogs coming in second, third, and fourth, respectively. The ratio of replies to original posts is significantly higher for discussions than for the other activity types, and for documents and microblogs there are not even as many replies as new posts.

Table I. Number of Engage activities by post type.
Thumbnail image of

Engage Data Collection

  1. Top of page
  2. Abstract
  3. Introduction
  4. Enterprise Social Networking in Alcatel-Lucent
  5. Engage Data Collection
  6. Engage Data Analytics
  7. Related Work
  8. Conclusions and Future Work
  9. References
  10. Biographical Information

Shortly after Engage was launched within Alcatel-Lucent, the same Bell Labs team that had developed P&P and performed analyses on the Yammer data became interested in analyzing data from Engage, which was considerably “richer.” Unlike Yammer, whose active user base was relatively small and mostly confined to Bell Labs, Engage usage was growing considerably and it was being used across the entire company in many different countries. In order to take advantage of this new source of data, two distinct sets of tools were needed: those that would be used to gather the data and store it in a way that would be conducive to the types of analyses to be performed, and those that would actually analyze the data. We discuss the data gathering tools in this section, and the data analysis tools in the “Engage Data Analytics” section that follows (at the end of which we also provide a summary of all of the tools used).

Due to various constraints and privacy considerations, the Engage data was only accessible via its APIs. To provide an easy way to experiment with the various Engage APIs and to understand what data was available and how it was structured, a home-grown tool named Engage and JSON Experimentation Combo Tool (EJECT) was developed. Following this, an “activity crawler” was developed that used the Engage APIs to retrieve activity and object-related data and store it in an associated MySQL* database.

EJECT: The Engage and JSON Experimentation Combo Tool

The Engage and JSON Experimentation Combo Tool (EJECT) was developed to enable the Bell Labs team to quickly execute dozens of Engage API parameter combinations to help them understand what data was available and how it was structured when returned. EJECT, a web-based application, used a special Engage account when executing user-specified APIs so there was no need for users to login. This account had access only to publicly available Engage data, and was later reused by the Engage Activity Crawler. EJECT provided a tab that displayed sets of sample Engage API URLs, grouped within sub-tabs that the user could choose from and subsequently modify before execution. When the user clicked a button to execute a given API, the raw JavaScript* Object Notation (JSON) or Extensible Markup Language (XML) results would be displayed. The user would then typically click another button to transform the raw results into one of several different human-readable formats. EJECT also provided a means to translate the 13-digit, UNIX*-based timestamp fields in the API results into a human-readable date and time, and vice versa. Once the Engage Activity Crawler came online (discussed next), EJECT also allowed team members to easily view the raw activity data from any of the crawls or to view simple statistics about how many Engage posting activities were taking place.

The Engage Activity Crawler and Associated MySQL Database

Having gained experience with the Engage APIs using EJECT, the next steps were to determine which data was of interest and to develop an associated database schema that would be both flexible and conducive to the types of analyses that were to be performed. The MySQL database ultimately consisted of five main tables that stored activity, user, group, project, and tag information. There were also four ancillary tables; one that assisted in mapping telephone numbers to countries, one that recorded failures when accessing Engage object-related APIs so that information retrieval would only be attempted a specific number of times for a given object, one that mapped the integer-based Engage object codes to associated object names, and finally, a table generated in another application and updated nightly that provided employee organizational hierarchy information. Together, these tables stored all of the information needed for the analyses to come.

With the database schema in place, the team began developing the Engage Activity Crawler (EAC) and its associated bots. Soon thereafter, we detected several problems with the Engage APIs. There were times when the APIs would return completely unintelligible data for no apparent reason, and other times when the APIs were simply not responsive at all. To combat these problems, “self-healing” logic was added to EAC that greatly reduced the number of missed activities.

EAC basically works as follows. Each time EAC runs (once per hour) it executes an Engage activity API to retrieve the activities that occurred since the last run, and then it loops through the activities and stores them in the database. It also keeps track of any users, groups, or projects that were referenced, and then calls additional APIs to retrieve information about those objects if they do not already exist in the database. EAC also retrieves the latest set of tags applied to any of the crawled activities and related objects. EAC includes a set of associated bots that call other Engage APIs to not only minimize missed Engage activities, but also provide the means to fill in missing object data and refresh certain data that would otherwise become stale. Each of the various bots runs once per hour, and they are staggered in such a way so as not to interfere with each other nor put undue stress on the Engage server. From July 2011 to the time of this writing (about 20 months), EAC and its associated bots have crawled over 240,000 activities, over 16,000 unique users, nearly 2,500 groups, over 1,000 projects and over 180,000 tags.

Engage Data Analytics

  1. Top of page
  2. Abstract
  3. Introduction
  4. Enterprise Social Networking in Alcatel-Lucent
  5. Engage Data Collection
  6. Engage Data Analytics
  7. Related Work
  8. Conclusions and Future Work
  9. References
  10. Biographical Information

Once the Engage Activity Crawler was developed and the associated MySQL database was beginning to grow, analysis of the Engage data commenced. In this section, we first discuss Engage usage and user interactions by country, and then by organizational level. We then describe some text mining analyses that were performed on Engage groups. Throughout this section, we also describe the various third-party tools that were used to perform these analyses. At the end of this section we summarize all of the tools used and describe a homegrown tool developed to provide on-the-fly, web-based Engage analyses.

Engage Usage by Country

During the 18-month time period from July 1, 2011 through the end of 2012, users in over 90 countries had posted at least one item on Engage. Given that ESN applications purport to increase communication and collaboration amongst a company's employee population, we were interested in determining whether Engage was helping to break down the invisible barriers that are often present between employees in different countries. We were also interested in finding out whether particular countries tend to use Engage proportionately more or less than others, perhaps due to differences in corporate or social culture, and whether there are significant differences in the manner in which employees in different countries use Engage.

Table II provides high-level statistics about employee posts and the number of users in the top 20 countries within the aforementioned 18 month timeframe. The posts from these 20 countries account for over 87 percent of all Engage posts, and the employees from these 20 countries account for 80 percent of the total who have posted at least one item to Engage.

Table II. Top 20 countries based on Engage usage.
Thumbnail image of

For the most part, rankings based on the number of posts, and rankings based on number of users are similar. However, employees in some countries such as the U.S., Canada and the Netherlands seem to post more items to Engage relative to the number of respective employees, and conversely, China seems to post fewer. There could be many reasons for this disparity, including cultural differences, the extent to which Engage is promoted, and the accessibility of Engage to the average worker.

Social network analysis of inter-country communication.

Social network analysis (SNA), which has been widely used in the study of social structures such as organizations, provides other interesting ways to look at the interactions between countries in Engage [18]. As online social network (OSN) usage has grown in recent years, SNA techniques have been applied to the study of these networks (see, for example, [23] and [25]). The open source tool, Gephi*, is a useful tool for this type of analysis [1, 3, 12]. Gephi gives researchers the ability to explore complex graph-based data structures interactively and visually, and to compute SNA metrics for the graphs. Gephi includes a core software computation and rendering engine, as well as numerous optional add-ons. We used Gephi for several different types of analyses during our Engage data analytics work.

Figure 1 shows a social network graph generated by Gephi depicting country-to-country interactions in Engage for the time period July 1, 2011 through December 31, 2012. The directed edges represent replies from users in one country to posts made by people in either the same country or another country. The edges shown in Figure 1 have been filtered to show only cases where at least 30 replies occurred between a country pair during the time period. Node size represents the relative betweenness centrality of the country in the graph. Countries grouped together by Gephi's clustering algorithm are shown as the same color. Darker, thicker edges indicate more replies from one country to another.

thumbnail image

Figure 1. Social network analysis graph of country-to-country interactions in Engage.

Download figure to PowerPoint

There are several interesting things to note about Figure 1. First, as the numerous edges near the core of the graph indicate, there is a significant amount of inter-country communication occurring in Engage. We can also see that there is intra-country communication occurring, as indicated by edges that loop back to the same country. This is particularly apparent for the United States and France, but also for Belgium, China, India, and several other countries. As shown in Table II, betweenness centrality tends to follow the relative number of posts, and France and the United States are clearly the two dominant countries in the graph. Finally, Gephi clustered the countries into seven communities. One community includes most Asia Pacific and China (APAC) countries, but the others seem to be widely dispersed geographically.

Inter-country interactions and geographic-based usage in Engage.

As described earlier, Engage users can either create a new piece of content (a new post) or act on an existing piece of content (a reply). When a user responds to an existing piece of content, an interaction instance is created. Here we consider the interaction as a directed connection between a user pair, i.e., a directed interaction from user A to B implies that user A responded to the content originated by user B.

One high-level way to consider the extent of inter-country interactions in Engage is to look at the proportion of replies made by people in one country to people in other countries compared to the total number of replies made by people in the original country. A higher number indicates a greater propensity for inter-country interactions, with any number greater than 0.5 showing that there are more replies to threads started outside the responder's country than inside. Table III shows this data for the top 20 countries in terms of number of Engage users. The first thing to note is that there are many inter-country replies. With the exception of the United States and China, all countries replied to posts that originated in countries other than themselves more than they did to internal country posts. Even for China and the United States, there were almost as many inter-country replies as intra-country replies. While this result is suggestive of strong inter-country communication, it could be influenced by the fact that employees in countries like France and the United States post more to Engage overall than employees in other countries. Therefore, it is reasonable to expect that many of the other countries would be posting inter-country replies to the posts made by people in France and the United States.

Table III. Proportion of inter-country replies to all replies for top Engage usage countries.
Thumbnail image of

Another way to determine how different countries interact with one another in Engage is to take each pair of distinct countries from the top 20-country list and compute the number of interactions for each type of activity, and for all activities. Thus, for each type of activity, we have a 20×20 matrix where each entry is the total number of interactions from an arbitrary user in country C to an arbitrary user in country D. Below we show an analysis performed using R [20], which is a free software programming language and environment for statistical computing and graphics. R provides a wide variety of statistical and graphical techniques and is highly extensible through the use of user-submitted packages for specific functions or areas of study. We used R to perform a wide range of data analytics, from simple data exploratory analysis such as answering descriptive questions, to deeper analysis, such as statistical modeling of user interaction graphs using organization attributes.

Figure 2, generated using R, shows the number of interactions versus the product of number of users for a pair of countries in log based 2 scale, where each circle represents a country pair, and the straight line has a slope of 1 indicating a proportional relationship between the number of interactions and the product of the number of users. It is clear that the proportional relationship is appropriate, which is understandable since more users will create more opportunities for interactions. Hence, in the following analysis, in order to treat each country equally so as not to overweigh the country with the most users, we normalize the number of interactions by the number of user pairs for each distinct country pair.

thumbnail image

Figure 2. Number of Engage interactions versus number of possible Engage user pairs.

Download figure to PowerPoint

To understand how the degree of interactions between two countries is influenced by the countries' geographic distance, we also correlate our interaction measure with the geodesic distance between a country pair. The geodesic distance between two countries is computed using the centroid location of each of the two countries. Figure 3 shows a scatter plot of the normalized interactions versus the geodesic distance between the country pairs, where each circle indicates a country pair and the line shows a linear fit to the data. It shows clearly that there is a downward trend of the interactions when the geodesic distance increases. A statistical significance T-test of the line slope gives a p-value of 0.9999, which indicates there is a statistically significant correlation between the interactions and geodesic distances. The importance of geographic distance on user interactions is reminiscent of what has been observed in public social networks [22]. However, statistical analysis also gives an R-square value (coefficient of determination) of 0.05, indicating that there is a 95 percent residual variability that is not explained by the geodesic distances.

thumbnail image

Figure 3. Normalized interactions versus normalized geodesic distance between country pairs.

Download figure to PowerPoint

Changes over time in inter-country reply patterns.

Another way to consider whether Engage is helping to break down inter-country communication barriers is to investigate the changes in inter-country reply patterns over time. To do this, we compared the time period July 1, 2011 through December 31, 2011 to the same time period one year later. For the time period in 2011, across all countries, 59 percent of replies were made to countries other than the poster's country. In 2012, the percentage increased to 61 percent, suggesting slightly more inter-country communication over time.

As described above, SNA techniques can be useful in comparing the two time periods of interest. In particular, we wanted to see whether there were changes in key SNA metrics, when looking at the 2011 data versus the 2012 data, that might indicate different underlying network activity. Table IV shows the main results of these analyses. The 2012 graph has fewer nodes (countries) and fewer edges (at least one reply from one country to another country), reflecting a decrease in the number of publicly available contributions to Engage from 2011 to 2012. Additionally, average degree is slightly lower for 2012 than for 2011, indicating that people in a given country replied to about one fewer country in 2012 than in 2011. Again, this most likely is due to the small decline in Engage contributions. Other metrics are only slightly different between the two years, suggesting that there is little difference in the underlying nature of the reply patterns between countries from 2011 to 2012.

Table IV. Social network analysis metrics for inter-country replies, 2011 versus 2012.
Thumbnail image of

Engage Usage by Organizational Level

One important distinction between enterprise social networks and public online social networks is that users of the former (employees) are not necessarily equivalent peers. Rather, the corporation has a tree-like hierarchy where each individual user resides at a certain position (or level). We refer to this structure as the organization graph. The resulting organization graph imposes certain relations between users in their interactions in the social network, for example, a manager-subordinate relation or coworker relation. We were interested in studying how the corporate hierarchy affects user interaction in the social network. Table V provides information about Engage posts for employees at each level within the Alcatel-Lucent organization. Note that the data in the table only includes posts made by Engage users who are still in Alcatel-Lucent at the time of this writing and reflects the current organization level of the user.

Table V. Engage statistics associated with the Alcatel-Lucent organization graph.
Thumbnail image of

Looking at the overall average posts per user at each level, with the exception of the chief executive officer (CEO) at the top (level 0), it appears that people at level 3 post more than everyone else, trailing off slowly from there as the organizational level increases. The data in the table shows that employees at some levels post proportionally much more than employees at other levels. The data also suggests that a tool like Engage provides the biggest benefits to employees at the middle levels of an organization, and that is why we see more usage at those levels. It also appears as if the people who report directly to the CEO have not yet adopted Engage very heavily. They may not need to use an ESN, in the sense that there are only a limited number of other people at their level and they probably already know one another. In addition, it appears that few of these people use the ESN to communicate with others below them in the reporting chain. The former Alcatel-Lucent CEO, who led the company during our data analysis timeframe, maintained an active blog available to all employees which accounted for most of his Engage usage.

For each user pair, we define their organizational hierarchy distance as the number of hops to their nearest common ancestor in the corporate hierarchy, whichever is larger. For example, a user pair with distance 1 would be either peers or a manager-subordinate pair. Figure 4 shows the frequency of the hierarchy distance of interacting user pairs in Engage. The hierarchy distance for interacting user pairs in Engage is heavily distributed in the range of 4 to 7, though the percentages in the figure are skewed somewhat by the fact that there are many more employee pairs at these levels than at other levels within the company. These results indicate that an ESN tool such as Engage does truly stimulate interactions between employees that are far apart in the corporate hierarchy, as opposed to a more traditional communication tool such as email where fewer emails tend to cross organization boundaries. The same observations are made in [8] (see Figure 2 and the text therein).

thumbnail image

Figure 4. Percentage of interactions versus average hierarchical distance of interacting user pairs.

Download figure to PowerPoint

Figure 5 shows a scatter plot of the normalized interactions between the top 20 countries versus the organization distances of country pairs, where the distance is taken as the average hierarchy distance between the user pairs. This is a companion plot to Figure 3, described earlier. As before, each circle indicates a country pair and the line shows a linear fit to the data. The plot shows that there is little correlation between the normalized interactions and organization distances between the country pairs. The Pearson statistical correlation tests (with a p-value of 0.66) confirm our observations here. The lack of correlation can also be explained by the observation in [8] (see Table IV) that the impact of organization distance on user interactions is mostly limited to users who are close organizationally (with a distance less than 4), but the majority of interacting user pairs from different countries are further apart as the range of the x-axis shows in Figure 5.

thumbnail image

Figure 5. Normalized interactions versus average hierarchical distance between country pairs.

Download figure to PowerPoint

Groups and Content in Engage

Thus far, the analyses presented in this paper have focused on patterns in posting behavior, rather than on the content of the posts. We are also interested in characterizing the content of Engage posts—what do people discuss in Engage? Below we describe analyses of text mined from posts within groups in Engage, as examples of content-based analyses of ESN data.

In Engage, any user can create a new group as a venue for employees to discuss a given topic. Table VI shows the top 20 publicly visible groups in Engage out of the 2,417 groups in existence at the time of this writing, along with the number of posts in each of these groups. Further analysis has shown that some of these groups have significantly higher ratios of replies per discussion than others. Understanding why this is so and what encourages greater participation within groups is another potential area of future study. Another thing to consider is that, since any user can create a group, there might be significant overlap between groups. Therefore, one analysis of interest might be a clustering analysis to determine which groups are similar to one another in terms of the content posted to the groups. To perform this sort of analysis, we used the RapidMiner [21] open source data-mining tool. RapidMiner consists of a core analytics engine plus numerous optional extensions and add-ons. RapidMiner provides an interactive graphical user interface (GUI) with over 400 operators that can be used to program complex sequences of data loading, preprocessing, transformation, visualization, modeling, and evaluation actions. It also can generate XML-based scripting code.

Table VI. Top 20 Engage groups based on Engage usage.
Thumbnail image of

To cluster the top 20 Engage groups, we mined the text of all posts made to each group; preprocessed the text by operations such as tokenizing, stemming, and removing stopwords; computed the term frequency-inverse document frequency (TF-IDF) values for remaining terms, based on the corpus of all posts to all Engage groups; and then performed a k-means cluster analysis based on Euclidean distance. The resulting clusters, when k = 4, are also shown in Table VI. Based on this analysis, cluster 0 includes hobby-related groups for learning languages, photography, and swapping goods; cluster 1 contains two product-oriented groups; cluster 2 primarily includes information technology (IT) and infrastructure-related groups; and cluster 3 contains several self-help groups for Engage, Macs*, Windows*, Linux*, and iDevices.

As another step in trying to understand the content of these different groups, we can look at the top terms in the centroid table for each cluster. These stemmed terms are shown in Table VII, and give us the next level of detail about the content in each group cluster. For example, the terms “token” and “vpn” are in the top terms list for Cluster 3, which may indicate that users in some of the groups in this cluster have questions about remotely connecting to Alcatel-Lucent's intranet. More complex analyses of text mined from Engage are in progress and are described briefly in the “Conclusions” section.

Table VII. Top terms per cluster for top 20 publicly viewable Engage groups.
Thumbnail image of

Data Analytics Tool Summary

Several different classes of tools were utilized in the data analytics work discussed in this paper, some of which were developed in-house while others were obtained from other sources. Yammer and Engage themselves are more correctly classified as applications rather than tools, and provided the sources of the data that we analyzed. Both the Yammer and Engage Activity Crawlers are examples of data collection tools which retrieve activity information using the Yammer and Engage APIs, respectively, and store the data in MySQL databases. EJECT was developed specifically to aid in experimenting with the Engage APIs, but it was also useful for converting JSON structures into one of several human-readable formats. The R language and RapidMiner are both third-party tools used for data analysis. Conversation Analysis and Visualization Tool for Yammer (CAViTY) and Conversation Analysis and Visualization Tool for Engage (CAViTEn) (discussed in more detail in the next section) are home-grown, web-based visualization tools for Yammer and Engage, respectively. Finally, Gephi is a more general third-party visualization tool. Table VIII provides a synopsis of these tools, ordered chronologically from the time they were first developed and/or used within Alcatel-Lucent.

Table VIII. Data analytics tool summary.
Thumbnail image of

CAViTEn: A Conversation Analysis and Visualization Tool for Engage

CAViTEn and its predecessor, CAViTY, are home-grown, web-based tools which were developed for performing on-the-fly analyses and visualizations for Engage and Yammer data, respectively. Figure 6 shows the upper portion of the main “Analysis and Visualization” tab of CAViTEn. Although all of the underlying data that CAViTEn analyzes is publicly available, some of its analyses would potentially cross data privacy lines if a given CAViTEn user could perform these analyses on any Engage user. For this reason, when users access CAViTEn, they must first authenticate using their corporate credentials, and for the various “user-centric” analyses, they can only perform these analyses on themselves.

thumbnail image

Figure 6. The main tab of CAViTEn–a Conversation Analysis and Visualization Tool for Engage.

Download figure to PowerPoint

CAViTEn can perform over 40 different types of analyses; some aggregate across large data sets, while others are specific to a given person, keyword, country or Engage group. CAViTEn makes use of several different third-party rendering tools to display results in different formats, including: pie charts, line charts, bar charts, word clouds, static “spoke-and-wheel” graphs, dynamic graphs, geographic maps, and text. As an example, Figure 7 shows a word cloud generated by CAViTEn based on the Engage posts of one of the authors of this document. When a given analysis is chosen, CAViTEn preselects a set of default parameters that the user can subsequently override. CAViTEn also has the capability to repeat a given analysis at a chosen frequency, and can be accessed in a “widgetized” mode so that CAViTEn graphs can be included within an embedded frame in any web page. Though used mostly by members of our Bell Labs team, some of the graphs and statistical data generated by CAViTEn have been used by other organizations within the company, and there are tentative plans to enhance CAViTEn further and advertise it more widely within the company.

thumbnail image

Figure 7. A word cloud generated by CAViTEn based on the Engage posts of one of the authors.

Download figure to PowerPoint

Related Work

  1. Top of page
  2. Abstract
  3. Introduction
  4. Enterprise Social Networking in Alcatel-Lucent
  5. Engage Data Collection
  6. Engage Data Analytics
  7. Related Work
  8. Conclusions and Future Work
  9. References
  10. Biographical Information

Social networking is a broad topic that has become ubiquitous in both public and corporate environments. The next two sections provide details about related work in online social networks and enterprise social networks, respectively.

Online Social Networks

There is a large body of work on online social networks such as Facebook and Twitter, covering a large array of topics including studies on visible interactions and passive actions such as profile browsing [2], inferring social hierarchy from social networks [13], estimating relationship strength from interaction activity (e.g., communication, tagging) and user similarity [28], and correlating activities in social media with user wireless network experience [19]. In addition, there have been studies of profile popularity [24], studies of influential users [26], and time evolutions [15, 27]. A study by Scellato et al. investigated how geographic distance affects social structure for various online social networks [22]. Their results showed that there is a large number of users with short-distance links and that clusters of friends are often geographically close. In addition, they demonstrated that different social networking services exhibit different geo-social properties: OSNs based mainly on location-advertising largely foster local ties and clusters, while services used mainly for news and content-sharing present more connections and clusters over longer distances. In our study, we also observe that country pairs with smaller distances tend to have a larger number of interactions.

Enterprise Social Networks

There are several studies on research prototypes of enterprise social networks. Brzozowski et al. of HP introduced a social media aggregator called WaterCooler to study user behavior [4]. He used case studies to show that geographically-dispersed teams are more prone to use enterprise social networking applications. Brzozowski et al. also studied the motivation of employees to contribute to the enterprise social network [5]. They found that visible comments from others were more effective than invisible clicks to motivate the employee to contribute again. In comparison, our research focuses more on analyzing the visible user activities, such as posting and replying, but does not cover user motivation.

Based on a deployed research prototype of an enterprise social network in IBM, DiMicco et al. interviewed employees and found that they were prone to make connections with other employees that they did not know well or at all when using the prototype [10]. In addition, Guy et al. built SOcial Networking Architecture (SONAR), a system that aggregated and shared information across multiple services such as blogs and organization charts [14], thus creating a social network service from pre-existing services.

Burns & Kotval analyzed usage data from Engage to provide insight about how questions are asked and answered in Engage [7]. Their analyses revealed significant differences in conversation patterns between question and non-question discussions. Implications for the design of social question and answer (Q&A) applications arising from the findings of these studies were also presented.

Kolari et al. analyzed the graph structure and properties of the use of an internal corporate blogging service [17]. They also studied the overall distribution of interactions across different organizational hierarchy distances. Using the Engage data collected from May through October 2011, Cao et al. studied the impact of a user's organizational position on his/her behavior in an enterprise social network [8]. Their analysis showed that both a user's geo-location and position in the corporate hierarchy were highly significant in predicting their interactions. In general, the closer the users were in geography and in the corporate hierarchy, the more likely that they would interact. They also proposed a formal logistic regression model to quantitatively measure the factors that affect user interaction in the social network. For example, if a pair of users was from the same country, then they were 2.27 times more likely to interact than if they were from different countries. As another example, if the corporate hierarchical distance between a user pair was 2, then the pair was 3.85 times less likely to interact than if the pair was of a peer or manager-subordinate relationship.

Conclusions and Future Work

  1. Top of page
  2. Abstract
  3. Introduction
  4. Enterprise Social Networking in Alcatel-Lucent
  5. Engage Data Collection
  6. Engage Data Analytics
  7. Related Work
  8. Conclusions and Future Work
  9. References
  10. Biographical Information

Alcatel-Lucent has provided its employees with three enterprise social networking tools over the past several years, two of which (Engage and Yammer) are still in use today. Due to data privacy considerations, the volume and richness of the ESN data, and the complexities and nuances of the APIs available to access it, the collection and storage of ESN data can be an arduous task. However, despite the challenges, such data offers an opportunity to better understand how ESN applications are being used within the corporate environment, and helps to answer questions such as whether they are fostering better communication between countries and across different organizational levels.

Our analyses have shown that there is some evidence to support the claim that Engage is helping to break down both the geographic and organizational barriers that typically exist in large, multinational corporations such as Alcatel-Lucent, but whether Engage has done so to a greater extent than other communications vehicles such as email or instant messaging is difficult to prove. The fact that overall Engage usage seems to be declining is likely due, in part, to corporate downsizing, but even if we look specifically at those Engage users who are still with the company, we see a reduction in their number of posts year over year from 2011 to 2012. The data also indicates that, although Engage is being used by many thousands of employees within Alcatel-Lucent, the majority of employees have never posted anything to Engage. That said, many employees are likely gaining value from Engage by viewing the various discussions and documents posted by other people.

One area of ongoing work is to extend the text mining analyses described in the “Groups and Content in Engage” section. Alcatel-Lucent's consulting organization approached Bell Labs in 2012 with two related problems: first, how could they more effectively determine which consultants would be best matches for upcoming engagements, based on the consultants' education and skills, past work history, and previously authored materials? Second, how could they more effectively find material that had been developed for previous engagements that might be repurposed for a current engagement? These questions have led to the development of an internal tool called NearToThis that takes text terms, uploaded documents, or uniform resource locators (URLs) as queries and returns recommended Engage groups, documents, and people as output, based on semantic analyses of data mined from Engage and other internal document repositories.

The Alcatel-Lucent Corporate Communications team is interested in a tool that would automatically detect and flag innovative ideas that are posted to certain groups or blogs in Engage so that the Communications team could promote discussion about those ideas. Being able to do this requires an even deeper level of semantic analysis than the work described above, which we are at the early stages of investigating.

Acknowledgements

The authors would like to thank Pete Schott, Li Erran Li, and Sining Chen for their stimulating conversations and other activities that contributed to the work presented in this paper. We would also like to thank Jem Janik and Kevin Joyce of the Engage team who have been very supportive of our work and provided considerable assistance and guidance along the way.

(Manuscript approved October 2013)

*Trademarks

  1. 1

    Facebook is a trademark of Facebook, Inc.

  2. 2

    Gephi is a trademark of The Gephi Consortium.

  3. 3

    JavaScript is a trademark of Oracle America, Inc.

  4. 4

    Jive is a registered trademark of Jive Software, Inc.

  5. 5

    Linux is a trademark of Linus Torvalds.

  6. 6

    Mac is a registered trademark of Apple Inc.

  7. 7

    MySQL is a trademark of MySQL AB Limited.

  8. 8

    Twitter is a registered trademark of Twitter, Inc.

  9. 9

    UNIX is a registered trademark of X/Open Company.

  10. 10

    Windows is a registered trademark and Yammer is a trademark of Microsoft, Inc.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Enterprise Social Networking in Alcatel-Lucent
  5. Engage Data Collection
  6. Engage Data Analytics
  7. Related Work
  8. Conclusions and Future Work
  9. References
  10. Biographical Information
  • [1]
    M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An Open Source Software for Exploring and Manipulating Networks,” Proc. 3rd Internat. AAAI Conf. on Weblogs and Social Media (ICWSM '09) (San Jose, CA, 2009), pp. 361362.
  • [2]
    F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizing User Behavior in Online Social Networks,” Proc. 9th ACM SIGCOMM Internet Measurement Conf. (IMC '09) (Chicago, IL, 2009), pp. 4962.
  • [3]
    A. Bruns, “How Long Is a Tweet? Mapping Dynamic Conversation Networks on Twitter Using Gawk and Gephi,” Inform. Commun. Soc., 15:9 (2012), 13231351, <http://snurb.info/files/2011/How%20Long%20Is%20a%20Tweet%20%28ICS%20submission%29.pdf>.
  • [4]
    M. J. Brzozowski, “WaterCooler: Exploring an Organization Through Enterprise Social Media,” Proc. ACM Internat. Conf. on Supporting Group Work (Group '09) (Sanibel Island, FL, 2009), pp. 219228.
  • [5]
    M. J. Brzozowski, T. Sandholm, and T. Hogg, “Effects of Feedback and Peer Pressure on Contributions to Enterprise Social Media,” Proc. ACM Internat. Conf. on Supporting Group Work (Group '09) (Sanibel Island, FL, 2009), pp. 6170.
  • [6]
    M. J. Burns, R. B. Craig, Jr., B. D. Friedman, P. D. Schott, and C. Senot, “Transforming Enterprise Communications Through the Blending of Social Networking and Unified Communications,” Bell Labs Tech. J., 16:1 (2011), 1934.
  • [7]
    M. J. Burns and X. P. Kotval, “Questions About Questions: Investigating How Knowledge Workers Ask and Answer Questions,” Bell Labs Tech. J., 17:4 (2013), 4361.
  • [8]
    J. Cao, H. Gao, L. E. Li, and B. Friedman, “Enterprise Social Network Analysis and Modeling: A Tale of Two Graphs,” Proc. 32nd IEEE Internat. Conf. on Comput. Commun. (INFOCOM '13) (Turin, Ita., 2013), pp. 23822390.
  • [9]
    J. Cao, H. Gao, L. E. Li, and B. D. Friedman, “System and Method of Determining Enterprise Social Network Usage,” U.S. Patent Application 13/688, 885 (2013).
  • [10]
    J. DiMicco, D. R. Millen, W. Geyer, C. Dugan, B. Brownholtz, and M. Muller, “Motivations for Social Networking at Work,” Proc. ACM Conf. on Comput. Supported Cooperative Work (CSCW '08) (San Diego, CA, 2008), pp. 711720.
  • [11]
    M. Ennes, “Social Media: What Most Companies Don't Know,” Harvard Business Review, Presentation, 2010, <http://hbr.org/web/slideshows/social-media-what-most-companies-dont-know/1-slide>.
  • [12]
    Gephi Consortium, “Gephi,” <http://gephi.org/>.
  • [13]
    M. Gupte, P. Shankar, J. Li, S. Muthukrishnan, and L. Iftode, “Finding Hierarchy in Directed Online Social Networks,” Proc. 20th Internat. Conf. on World Wide Web (WWW '11) (Hyderabad, Ind., 2011), pp. 557566.
  • [14]
    I. Guy, M. Jacovi, E. Shahar, N. Meshulam, V. Soroka, and S. Farrell, “Harvesting with SONAR: The Value of Aggregating Social Network Information,” Proc. 26th Annual SIGCHI Conf. on Human Factors in Comput. Syst. (CHI '08) (Florence, Ita., 2008), pp. 10171026.
  • [15]
    J. Heidemann, M. Klier, and F. Probst, “Identifying Key Users in Online Social Networks: A PageRank Based Approach,” Proc. Internat. Conf. on Inform. Syst. (ICIS '10) (Saint Louis, MO, 2010).
  • [16]
    R. Holmes, “5 Ways Social Media Will Change the Way You Work in 2013,” Forbes, Dec. 11, 2012, <http://www.forbes.com/sites/ciocentral/2012/12/11/5-ways-social-media-will-change-the-way-you-work-in-2013/>.
  • [17]
    P. Kolari, T. Finin, K. Lyons, Y. Yesha, Y. Yesha, S. Perelgut, and J. Hawkins, “On the Structure, Properties and Utility of Internal Corporate Blogs,” Proc. Internat. Conf. on Weblogs and Social Media (ICWSM '07) (Boulder, CO, 2007).
  • [18]
    A. Marin and B. Wellman, “Social Network Analysis: An Introduction,” The SAGE Handbook of Social Network Analysis (J. Scott and P. J. Carrington, eds.), Sage, London, 2011, pp. 1125.
  • [19]
    T. Qiu, J. Feng, Z. Ge, J. Wang, J. Xu, and J. Yates, “Listen to Me if You Can: Tracking User Experience of Mobile Network on Social Media,” Proc. 10th Internet Measurement Conf. (IMC '10) (Melbourne, Aus., 2010), pp. 288293.
  • [20]
    R Foundation, “The R Project for Statistical Computing,” <http://www.r-project.org>.
  • [21]
    Rapid-I, “RapidMiner,” <http://rapid-i.com/content/view/181/190/>.
  • [22]
    S. Scellato, C. Mascolo, M. Musolesi, and V. Latora, “Distance Matters: Geo-Social Metrics for Online Social Networks,” Proc. 3rd Workshop on Online Social Networks (WOSN '10) (Boston, MA, 2010).
  • [23]
    M. Smith, D. L. Hansen, and E. Gleave, “Analyzing Enterprise Social Media Networks,” Proc. 12th IEEE Internat. Conf. on Comput. Sci. and Eng. (CSE '09) (Vancouver, BC, Can., 2009), vol. 4, pp. 705710.
  • [24]
    T. Strufe, “Profile Popularity in a Business-Oriented Online Social Network,” Proc. 3rd Workshop on Social Network Syst. (SNS '10) (Paris, Fra., 2010), pp. 2:12:6.
  • [25]
    J. Ugander, B. Karrer, L. Backstrom, and C. Marlow, “The Anatomy of the Facebook Social Graph,” arXiv:1111.4503v1, Nov. 2011, <http:// http://arxiv.org/abs/1111.4503>.
  • [26]
    M. Valafar, R. Rejaie, and W. Willinger, “Beyond Friendship Graphs: A Study of User Interactions in Flickr,” Proc. 2nd Workshop on Online Social Networks (WOSN '09) (Barcelona, Spn., 2009), pp. 2530.
  • [27]
    B. Viswanath, A. Mislove, M. Cha, and K. P. Gummadi, “On the Evolution of User Interaction in Facebook,” Proc. 2nd Workshop on Online Social Networks (WOSN '09) (Barcelona, Spn., 2009), pp. 3742.
  • [28]
    R. Xiang, J. Neville, and M. Rogati, “Modeling Relationship Strength in Online Social Networks,” Proc. 19th Internat. Conf. on World Wide Web (WWW '10) (Raleigh, NC, 2010), pp. 981990.

Biographical Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Enterprise Social Networking in Alcatel-Lucent
  5. Engage Data Collection
  6. Engage Data Analytics
  7. Related Work
  8. Conclusions and Future Work
  9. References
  10. Biographical Information
Thumbnail image of

BRIAN D. FRIEDMAN is a member of technical staff in the Interactive Systems Research Department of the Multimedia Research Domain at Alcatel-Lucent Bell Labs, Murray Hill, New Jersey. He is currently working with several different groups of researchers developing experimental social networking and multimedia prototypes and applications. Having spent most of his career as a software developer and researcher within Bell Labs, he has also performed systems engineering, database architecture and design, user interface design, systems administration, and project management. Mr. Friedman earned a B.S. in computer science/electronics from the State University of New York at Binghamton in Binghamton, New York and a master of science in Advanced Technology − Specialization in Computer Science from the Thomas J. Watson School of Engineering in Binghamton, New York.

Thumbnail image of

MICHAEL J. BURNS is a technical manager in the Interactive Systems Research Department of the Multimedia Research Domain at Alcatel-Lucent Bell Labs, Murray Hill, New Jersey. His current research is focused on the intersection of analytics, multimedia, and social media for collaboration and knowledge management. During his career he has performed systems engineering, user experience design, and software development on a wide range of operations support systems and multimedia- and web-based applications and services. He is a member of the Association for Computing Machinery (ACM). Dr. Burns earned a B.A. in psychology from Washington and Lee University in Lexington, Virginia, and an M.A. and a Ph.D. in cognitive psychology from the University of California, Los Angeles (UCLA).

Thumbnail image of

JIN CAO is a distinguished member of technical staff in the IP Platforms Research Program Bell Labs, Murray Hill, New Jersey. Dr. Cao earned a B.A. in Applied Mathematics from Tsinghua University, China, and a Ph.D. in statistics from McGill University, Montreal, Canada. Her thesis was on the statistical analysis of brain images. Since joining Bell Labs, Dr. Cao has done research in various areas, mostly focusing on statistical problems arising from data networks, for example, network tomography, traffic modeling and simulation, net-work monitoring and performance analysis, and data streaming algorithms.