Supplementary material ====================== Description of files in this directory Data Files ============== dblp-2013-12-23.zip - compressed dblp file from 23.12.2013 (original xml file: 1.4 GB) dblp_id_gender_dict_v0.dict - ID-gender python dictionary without cross checking heuristics dblp_id_gender_dict_v1.dict - ID-gender python dictionary with cross checking heuristics name_gender_score.csv - list of names, most likely gender and the score Output ============== dblp_id_name.csv - list of IDs and the corresponding names dblp_edges_maxAuthor20.csv - edge list of co-author network with one entry for each collaboration and its year (multi graph of collaborations) dblp_edges_maxAuthor20_simple.csv - edge list of co-author network with one entry for each collaboration and its year (simple graph of co-authorship) dblp_edges_maxAuthor20_mentee_frequent.csv - edge list of student-mentor network (directed, first entry of each line is the ID of a student, followed by the mentor ID) dblp_edges_maxAuthor20_mentee_frequent_known_degree_v1.csv - degree statistics of student-mentor network (dictionary v1) dblp_edges_maxAuthor20_mentee_frequent_known_degree_v0.csv - degree statistics of student-mentor network (dictionary v0) dblp_edges_maxAuthor20_simple_degree_1000_edges_v1.csv - edge list of high degree known gender nodes in the dblp co-author graph (dictionary v1) dblp_nodes_maxAuthor20_simple_degree_1000_nodes_v1.csv - nodes list of high degree known gender nodes in the dblp co-author graph (dictionary v1) dblp_edges_maxAuthor20_simple_degree_1000_edges_v0.csv - edge list of high degree known gender nodes in the dblp co-author graph (dictionary v0) dblp_nodes_maxAuthor20_simple_degree_1000_nodes_v0.csv - nodes list of high degree known gender nodes in the dblp co-author graph (dictionary v0) degreesum_percentage_v1.csv - data points of the figure depicting the degree sum percentages FM_v1.csv - data points of the figure depicting the expected and observed number of mixed edges tail_glass_ceiling_v1.csv - data points of the figure depicting the percentage of women with a degree above k Mathematica Code Files ==================== dblp_top_clean.nb - generate networks for top nodes in dblp dblp_bias_stat_clean.nb - degree distribution in student-mentor graph A_D_Proof.nb - C_B and C_R behavior Vcolor_simple_v1.m - node colors in mathematica format Python Source ============== parse_DBLP_xml.py - parsing xml, creating collaboration graph for all publications with at most 20 co-authors convert_to_simple_graph - turn multi-graph (with possibly more than one edge per vertex pair) into a simple graph create_mentor_file.py - create the mentor graph based on the co-author graph and the first publication years create_id_degree_file.py - create a list of the degrees of each node extract_top_graph.py - compute the top nodes with a known gender and the induced co-author subgraph compute_known_degree.py - create the degree distribution file for degree/female/male/both/total of ids with known gender README.txt - text file explaining how to run the python code.