{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Cleaning scripts" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Harmonize MX and Dmitry Nikolayev's datasets\n", "\n", "The replication datasets of Michalopoulos and Xue (MX) contain motifs data on 958 ethnic groups, taken from the 2019 version of Yuri Berezkin's catalog. However, these do not contain location data for the ethnic groups. Independent of MX, Dmitry Nikolayev provides coordinates for the ethnic groups [here](https://github.com/macleginn/mythology-queries). However, this only contains 926 ethnic groups and appears to be based on an older version of Berezkin's cataglog.\n", "\n", "This notebook reconciles the MX and DN datasets to the greatest extent possible. " ] }, { "cell_type": "code", "execution_count": 774, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import re\n", "from sklearn.linear_model import LinearRegression" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Load the MX dataset." ] }, { "cell_type": "code", "execution_count": 640, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | group_Berezkin | \n", "a1 | \n", "a10 | \n", "a11a | \n", "a11b | \n", "a11c | \n", "a12 | \n", "a12a | \n", "a12b | \n", "a12c | \n", "... | \n", "n28 | \n", "n29 | \n", "n3 | \n", "n30 | \n", "n4 | \n", "n5 | \n", "n6 | \n", "n7 | \n", "n8 | \n", "n9 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Abaza (Abazins) | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1 | \n", "Abkhaz | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "
2 | \n", "Aceh | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
3 | \n", "Ache | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
4 | \n", "Achomavi | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
953 | \n", "Teleut | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
954 | \n", "Central Yakuts | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
955 | \n", "Arabs: Iraq | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
956 | \n", "Liaoning and Jilin Chinese | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
957 | \n", "Norvegians, Faroe islanders | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
958 rows × 2565 columns
\n", "\n", " | Longitude | \n", "Latitude | \n", "Name | \n", "
---|---|---|---|
0 | \n", "20.0 | \n", "-26.0 | \n", "Bushmen | \n", "
1 | \n", "21.0 | \n", "-32.0 | \n", "Khoikhoi | \n", "
2 | \n", "26.5 | \n", "-32.5 | \n", "Xhosa | \n", "
3 | \n", "30.5 | \n", "-28.5 | \n", "Zulu,Swasi | \n", "
4 | \n", "26.5 | \n", "-27.5 | \n", "Sotho, Tswana | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
921 | \n", "-72.5 | \n", "-39.0 | \n", "Mapuche | \n", "
922 | \n", "-67.5 | \n", "-42.0 | \n", "Puelche | \n", "
923 | \n", "-69.0 | \n", "-47.0 | \n", "Tehuelche | \n", "
924 | \n", "-68.5 | \n", "-54.5 | \n", "Selknam | \n", "
925 | \n", "-71.0 | \n", "-55.0 | \n", "Yamana | \n", "
926 rows × 3 columns
\n", "\n", " | group_Berezkin | \n", "a1 | \n", "a10 | \n", "a11a | \n", "a11b | \n", "a11c | \n", "a12 | \n", "a12a | \n", "a12b | \n", "a12c | \n", "... | \n", "n28 | \n", "n29 | \n", "n3 | \n", "n30 | \n", "n4 | \n", "n5 | \n", "n6 | \n", "n7 | \n", "n8 | \n", "n9 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
941 | \n", "NW Yakuts (Yessey,Anabar,Olenyok, Lower Lena) | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
945 | \n", "NE Yakuts (Yana,Indigirka,Kolyma) | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
950 | \n", "Western Yakuts (Olyokma,Vilyuy) | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
954 | \n", "Central Yakuts | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
4 rows × 2565 columns
\n", "\n", " | mx_name | \n", "coords_name | \n", "
---|---|---|
0 | \n", "Wolof | \n", "Fulbe,Wolof,Serer | \n", "
1 | \n", "Serer | \n", "Fulbe,Wolof,Serer | \n", "
2 | \n", "Japan AD 700-1700 | \n", "Japan | \n", "
3 | \n", "Japanese folklore | \n", "Japan | \n", "
4 | \n", "NW Yakuts (Yessey,Anabar,Olenyok, Lower Lena) | \n", "Yakut | \n", "
5 | \n", "Central Yakuts | \n", "Yakut | \n", "
6 | \n", "NE Yakuts (Yana,Indigirka,Kolyma) | \n", "Yakut | \n", "
7 | \n", "Western Yakuts (Olyokma,Vilyuy) | \n", "Yakut | \n", "
8 | \n", "Trans NG East Highlands | \n", "Trans New Guinea East | \n", "
9 | \n", "Trans NG East Lowlands North | \n", "Trans New Guinea East | \n", "
10 | \n", "Trans NG East Lowlands South | \n", "Trans New Guinea East | \n", "
11 | \n", "Northern Khanty | \n", "Hanty | \n", "
12 | \n", "Southern Khanty | \n", "Hanty | \n", "
13 | \n", "Eastern Khanty(Ostyaks) | \n", "Hanty | \n", "
14 | \n", "Forest Yukaghir (Upper Kolyma) | \n", "Yukaghir | \n", "
15 | \n", "Tundra Yukaghir (Lower Kolyma) | \n", "Yukaghir | \n", "
16 | \n", "Lahu,Sani,Nasu,Jino | \n", "Lahu,Sani,Hani,Nasu,Jino | \n", "
17 | \n", "Hani, Akha | \n", "Lahu,Sani,Hani,Nasu,Jino | \n", "
\n", " | group_Berezkin | \n", "a1 | \n", "a10 | \n", "a11a | \n", "a11b | \n", "a11c | \n", "a12 | \n", "a12a | \n", "a12b | \n", "a12c | \n", "... | \n", "n28 | \n", "n29 | \n", "n3 | \n", "n30 | \n", "n4 | \n", "n5 | \n", "n6 | \n", "n7 | \n", "n8 | \n", "n9 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
941 | \n", "Yakut | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
945 | \n", "Yakut | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
950 | \n", "Yakut | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
954 | \n", "Yakut | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
4 rows × 2565 columns
\n", "\n", " | group_Berezkin | \n", "a1 | \n", "a10 | \n", "a11a | \n", "a11b | \n", "a11c | \n", "a12 | \n", "a12a | \n", "a12b | \n", "a12c | \n", "... | \n", "n28 | \n", "n29 | \n", "n3 | \n", "n30 | \n", "n4 | \n", "n5 | \n", "n6 | \n", "n7 | \n", "n8 | \n", "n9 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
912 | \n", "Yakut | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1 rows × 2565 columns
\n", "\n", " | group_Berezkin | \n", "Longitude | \n", "Latitude | \n", "Name | \n", "
---|---|---|---|---|
24 | \n", "Almora (Rangkas) | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
40 | \n", "Arabs (literary tradition) | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
253 | \n", "Fujian Chinese | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
254 | \n", "Fula (Pular) | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
258 | \n", "Galicians | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
274 | \n", "Gulf: Kuwait,Bahrain,Qatar,Oman | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
286 | \n", "Henan Chinese | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
291 | \n", "Himachali Pahari | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
304 | \n", "Iban,Bidayu,Sakarram | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
306 | \n", "Icelanders (after A.D. 1800) | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
326 | \n", "Jiangsu and Zhejang Chinese | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
381 | \n", "Khotan Saka | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
439 | \n", "Liaoning and Jilin Chinese | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
475 | \n", "Maldives | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
674 | \n", "Salars | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
690 | \n", "Scandinavians, early written sources | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
714 | \n", "Sichuan Chinese | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
781 | \n", "Teleut | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
829 | \n", "Tujia | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
830 | \n", "Tulu | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
864 | \n", "Urums, Rumei | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
880 | \n", "Wallons | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
907 | \n", "Xinca | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
927 | \n", "Yeyi | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
947 | \n", "NaN | \n", "-20.0 | \n", "65.0 | \n", "Edda,Saxo Grammaticus | \n", "
948 | \n", "NaN | \n", "40.0 | \n", "32.0 | \n", "1001 nights | \n", "
949 | \n", "NaN | \n", "111.0 | \n", "1.0 | \n", "Other Dayak | \n", "
\n", " | group | \n", "longitude | \n", "latitude | \n", "
---|---|---|---|
0 | \n", "Abaza (Abazins) | \n", "42.0 | \n", "44.2 | \n", "
1 | \n", "Abenaki,Penobscot | \n", "-70.5 | \n", "44.5 | \n", "
2 | \n", "Abkhaz | \n", "40.8 | \n", "43.2 | \n", "
3 | \n", "Abor,Gallong,Tani | \n", "95.0 | \n", "28.5 | \n", "
4 | \n", "Aceh | \n", "95.6 | \n", "5.3 | \n", "
\n", " | motif_id | \n", "title_english | \n", "title_russian | \n", "title_english_googleAPI | \n", "desc_eng | \n", "desc_russian | \n", "desc_english_googleAPI | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "a1 | \n", "The old sun | \n", "Древнее солнце | \n", "Ancient sun | \n", "Another sun, usually less benevolent and/or po... | \n", "Другое солнце – обычно менее могущественное ил... | \n", "Another sun — usually less powerful or less be... | \n", "
1 | \n", "a10 | \n", "The sun finds its eyes | \n", "Солнце находит себе глаза | \n", "The sun finds its eyes | \n", "The sun gets his bright eye or eyes from an an... | \n", "Солнце получает свои сверкающие глаза (глаз) о... | \n", "The sun gets its sparkling eyes (eyes) from th... | \n", "
2 | \n", "a11a | \n", "Eyes of the Sun and the Moon: coolness and night | \n", "Глаза светил: прохлада и ночь | \n", "Eyes of the luminaries: coolness and night | \n", "Visible sun and/or moon are the Sun's and/or t... | \n", "Видимое солнце или луна есть их глаза; если бы... | \n", "The visible sun or moon is their eyes; if the ... | \n", "
3 | \n", "a11b | \n", "One-eyed luminaries | \n", "Одноглазые светила | \n", "One-eyed luminaries | \n", "The Sun or the Moon have only one eye (the Mun... | \n", "Солнце или Месяц одноглаз (мундуруку: слеп) | \n", "Sun or Month odnoglaz (Munduruku: blind) | \n", "
4 | \n", "a11c | \n", "The Sun, the Moon and monster’s eyes | \n", "Солнце, Луна и глаза чудовища | \n", "Sun, moon and monster eyes | \n", "The Sun and the Moon kill a monster whose eyes... | \n", "Солнце и Луна убивают чудовище, чьи глаза свет... | \n", "The sun and the moon kill a monster whose eyes... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2559 | \n", "n5 | \n", "They recognize winter by rime, summer by rain | \n", "Зиму узнают по инею, лето по дождю | \n", "Winter learn by hoarfrost, summer by rain | \n", "Long trips, campaigns, flights or battles are ... | \n", "Длительные поездки, походы, полеты или битвы о... | \n", "Long trips, trips, flights or battles are desc... | \n", "
2560 | \n", "n6 | \n", "Horse tells to whip him strongly | \n", "Хлестнуть коня | \n", "Whip a horse | \n", "A horse tells his rider to whip him with such ... | \n", "Конь велит всаднику хлестнуть его так сильно, ... | \n", "The horse tells the rider to whip him so hard ... | \n", "
2561 | \n", "n7 | \n", "Three apples | \n", "Три яблока | \n", "Three apples | \n", "Closing formula of the folktale: three apples ... | \n", "Сказочный текст завершается формулой, сообщающ... | \n", "The fabulous text ends with a formula that sta... | \n", "
2562 | \n", "n8 | \n", "Storyteller instead of a cannonball | \n", "Сказочник вместо ядра | \n", "The storyteller instead of the core | \n", "Closing formula of the folktale: characters pu... | \n", "Сказочный текст завершается формулой, сообщающ... | \n", "The fabulous text ends with a formula that sta... | \n", "
2563 | \n", "n9 | \n", "Who is coming? | \n", "Кто приближается? | \n", "Who is coming? | \n", "Two persons see a horseman who is ever nearer ... | \n", "Двое персонажей обсуждают приближение всадника... | \n", "Two characters are discussing the approach of ... | \n", "
2564 rows × 7 columns
\n", "\n", " | motif_id | \n", "title | \n", "description | \n", "
---|---|---|---|
0 | \n", "a1 | \n", "The old sun | \n", "Another sun, usually less benevolent and/or po... | \n", "
1 | \n", "a10 | \n", "The sun finds its eyes | \n", "The sun gets his bright eye or eyes from an an... | \n", "
2 | \n", "a11a | \n", "Eyes of the Sun and the Moon: coolness and night | \n", "Visible sun and/or moon are the Sun's and/or t... | \n", "
3 | \n", "a11b | \n", "One-eyed luminaries | \n", "The Sun or the Moon have only one eye (the Mun... | \n", "
4 | \n", "a11c | \n", "The Sun, the Moon and monster’s eyes | \n", "The Sun and the Moon kill a monster whose eyes... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
2559 | \n", "n5 | \n", "They recognize winter by rime, summer by rain | \n", "Long trips, campaigns, flights or battles are ... | \n", "
2560 | \n", "n6 | \n", "Horse tells to whip him strongly | \n", "A horse tells his rider to whip him with such ... | \n", "
2561 | \n", "n7 | \n", "Three apples | \n", "Closing formula of the folktale: three apples ... | \n", "
2562 | \n", "n8 | \n", "Storyteller instead of a cannonball | \n", "Closing formula of the folktale: characters pu... | \n", "
2563 | \n", "n9 | \n", "Who is coming? | \n", "Two persons see a horseman who is ever nearer ... | \n", "
2564 rows × 3 columns
\n", "\n", " | motif_id | \n", "title | \n", "description | \n", "
---|---|---|---|
99 | \n", "a8a | \n", "The Sun, the Moon and the star: released by th... | \n", "\n", " |
770 | \n", "h21a | \n", "Not to kill a big fish | \n", "\n", " |
1099 | \n", "i97 | \n", "Rainbow horse | \n", "\n", " |
1833 | \n", "l23b | \n", "Transformation into spindle | \n", "\n", " |
1859 | \n", "l37a | \n", "To get know causes of problems | \n", "\n", " |
2011 | \n", "m105a | \n", "Make believe killing of children | \n", "\n", " |
\n", " | motif_id | \n", "say_related | \n", "one_related | \n", "go_related | \n", "get_related | \n", "would_related | \n", "know_related | \n", "make_related | \n", "like_related | \n", "think_related | \n", "... | \n", "mindful_related | \n", "optimum_related | \n", "repercussion_related | \n", "shabby_related | \n", "subjectivity_related | \n", "aspiring_related | \n", "distorted_related | \n", "galley_related | \n", "overlapping_related | \n", "situational_related | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "a1 | \n", "[] | \n", "['one', 'another'] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
1 | \n", "a10 | \n", "[] | \n", "[] | \n", "['get'] | \n", "['get', 'find'] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
2 | \n", "a11a | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "['would'] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
3 | \n", "a11b | \n", "[] | \n", "['one'] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
4 | \n", "a11c | \n", "[] | \n", "[] | \n", "[] | \n", "['take', 'give'] | \n", "[] | \n", "[] | \n", "['give'] | \n", "[] | \n", "[] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2559 | \n", "n5 | \n", "['describe', 'mean', 'know'] | \n", "[] | \n", "['get'] | \n", "['get'] | \n", "[] | \n", "['know', 'recognize', 'learn'] | \n", "[] | \n", "['like', 'similar'] | \n", "['know'] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
2560 | \n", "n6 | \n", "['tell'] | \n", "[] | \n", "['come'] | \n", "['come'] | \n", "['would'] | \n", "['tell'] | \n", "[] | \n", "[] | \n", "[] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
2561 | \n", "n7 | \n", "['say'] | \n", "['one', 'three', 'least'] | \n", "['get'] | \n", "['get', 'give'] | \n", "[] | \n", "['say'] | \n", "['give'] | \n", "[] | \n", "[] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
2562 | \n", "n8 | \n", "[] | \n", "[] | \n", "[] | \n", "['arrive'] | \n", "[] | \n", "[] | \n", "['make'] | \n", "[] | \n", "[] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
2563 | \n", "n9 | \n", "[] | \n", "['one', 'two'] | \n", "['come'] | \n", "['come'] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "... | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "[] | \n", "
2564 rows × 9887 columns
\n", "\n", " | column | \n", "
---|---|
0 | \n", "motif_id | \n", "
1 | \n", "say_related | \n", "
2 | \n", "one_related | \n", "
3 | \n", "go_related | \n", "
4 | \n", "get_related | \n", "
... | \n", "... | \n", "
9882 | \n", "aspiring_related | \n", "
9883 | \n", "distorted_related | \n", "
9884 | \n", "galley_related | \n", "
9885 | \n", "overlapping_related | \n", "
9886 | \n", "situational_related | \n", "
9887 rows × 1 columns
\n", "\n", " | column | \n", "
---|---|
96 | \n", "question_related | \n", "
230 | \n", "research_related | \n", "
644 | \n", "search_related | \n", "
1024 | \n", "researcher_related | \n", "
1331 | \n", "purchase_related | \n", "
1346 | \n", "request_related | \n", "
2218 | \n", "hunt_related | \n", "
2460 | \n", "hunting_related | \n", "
2526 | \n", "deer_related | \n", "
2596 | \n", "chase_related | \n", "
2655 | \n", "hunter_related | \n", "
2991 | \n", "questionnaire_related | \n", "
3569 | \n", "pursuit_related | \n", "
3676 | \n", "quest_related | \n", "
5679 | \n", "questionable_related | \n", "
5856 | \n", "questioning_related | \n", "
6876 | \n", "conquest_related | \n", "
\n", " | group | \n", "earthquakes | \n", "
---|---|---|
0 | \n", "Abaza (Abazins) | \n", "0.000000 | \n", "
1 | \n", "Abenaki,Penobscot | \n", "0.031250 | \n", "
2 | \n", "Abkhaz | \n", "0.003279 | \n", "
3 | \n", "Abor,Gallong,Tani | \n", "0.012422 | \n", "
4 | \n", "Aceh | \n", "0.000000 | \n", "
... | \n", "... | \n", "... | \n", "
918 | \n", "Zaparo | \n", "0.000000 | \n", "
919 | \n", "Zapotec,Chatino | \n", "0.000000 | \n", "
920 | \n", "Zoque | \n", "0.000000 | \n", "
921 | \n", "Zulu,Swasi | \n", "0.000000 | \n", "
922 | \n", "Zuni | \n", "0.000000 | \n", "
923 rows × 2 columns
\n", "\n", " | group | \n", "longitude | \n", "latitude | \n", "concept | \n", "share | \n", "
---|---|---|---|---|---|
0 | \n", "Abaza (Abazins) | \n", "42.0 | \n", "44.2 | \n", "earthquakes | \n", "0.000000 | \n", "
1 | \n", "Abenaki,Penobscot | \n", "-70.5 | \n", "44.5 | \n", "earthquakes | \n", "0.031250 | \n", "
2 | \n", "Abkhaz | \n", "40.8 | \n", "43.2 | \n", "earthquakes | \n", "0.003279 | \n", "
3 | \n", "Abor,Gallong,Tani | \n", "95.0 | \n", "28.5 | \n", "earthquakes | \n", "0.012422 | \n", "
4 | \n", "Aceh | \n", "95.6 | \n", "5.3 | \n", "earthquakes | \n", "0.000000 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
5533 | \n", "Zaparo | \n", "-75.0 | \n", "-2.5 | \n", "hunting | \n", "0.285714 | \n", "
5534 | \n", "Zapotec,Chatino | \n", "-96.5 | \n", "16.5 | \n", "hunting | \n", "0.166667 | \n", "
5535 | \n", "Zoque | \n", "-92.5 | \n", "16.5 | \n", "hunting | \n", "0.109091 | \n", "
5536 | \n", "Zulu,Swasi | \n", "30.5 | \n", "-28.5 | \n", "hunting | \n", "0.092593 | \n", "
5537 | \n", "Zuni | \n", "-109.0 | \n", "35.0 | \n", "hunting | \n", "0.065421 | \n", "
5538 rows × 5 columns
\n", "\n", " | cntry | \n", "lrgdpch2010 | \n", "lnp06_18pc | \n", "lnavgy06_18 | \n", "fem19 | \n", "trust_wvsavg | \n", "lntrust_wvsavg | \n", "risktaking | \n", "trust_gps | \n", "patience | \n", "... | \n", "harm_vice | \n", "fair_vice | \n", "ingroup_vice | \n", "auth_vice | \n", "purity_vice | \n", "harm_virtue | \n", "fair_virtue | \n", "ingroup_virtue | \n", "auth_virtue | \n", "purity_virtue | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "AFG | \n", "6.955211 | \n", "NaN | \n", "-1.678959 | \n", "48.848999 | \n", "NaN | \n", "NaN | \n", "0.120764 | \n", "0.315964 | \n", "-0.201360 | \n", "... | \n", "42.266427 | \n", "0.952258 | \n", "14.069490 | \n", "3.765717 | \n", "7.757953 | \n", "3.237963 | \n", "2.588764 | \n", "42.566406 | \n", "23.111392 | \n", "3.172359 | \n", "
1 | \n", "AGO | \n", "8.538473 | \n", "NaN | \n", "NaN | \n", "75.372002 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "23.754314 | \n", "1.250721 | \n", "9.719039 | \n", "1.912417 | \n", "8.483376 | \n", "3.865623 | \n", "0.673173 | \n", "25.869074 | \n", "10.680665 | \n", "3.299046 | \n", "
2 | \n", "AIA | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "40.000000 | \n", "0.000000 | \n", "24.000000 | \n", "5.000000 | \n", "11.000000 | \n", "4.000000 | \n", "3.000000 | \n", "65.000000 | \n", "40.000000 | \n", "5.000000 | \n", "
3 | \n", "ALB | \n", "8.797417 | \n", "-0.805970 | \n", "0.162193 | \n", "47.081001 | \n", "1.192248 | \n", "0.175841 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "63.299846 | \n", "0.115660 | \n", "25.964083 | \n", "8.133978 | \n", "14.429597 | \n", "8.202667 | \n", "3.342797 | \n", "80.812782 | \n", "48.338948 | \n", "5.107501 | \n", "
4 | \n", "AND | \n", "NaN | \n", "0.211741 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "82.000516 | \n", "0.000000 | \n", "27.000358 | \n", "9.000020 | \n", "18.000079 | \n", "9.000000 | \n", "4.000040 | \n", "111.000854 | \n", "59.000357 | \n", "8.000060 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
194 | \n", "WSM | \n", "NaN | \n", "-0.636054 | \n", "-0.144895 | \n", "23.587999 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "6.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "2.000000 | \n", "0.000000 | \n", "0.000000 | \n", "8.000000 | \n", "3.000000 | \n", "0.000000 | \n", "
195 | \n", "YEM | \n", "7.780259 | \n", "-2.526622 | \n", "NaN | \n", "5.827000 | \n", "1.403987 | \n", "0.339316 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "41.246887 | \n", "1.000000 | \n", "14.811722 | \n", "6.000000 | \n", "7.094139 | \n", "1.282417 | \n", "2.094139 | \n", "51.776191 | \n", "28.964469 | \n", "2.905861 | \n", "
196 | \n", "ZAF | \n", "8.924416 | \n", "0.369796 | \n", "2.071975 | \n", "48.768002 | \n", "1.212878 | \n", "0.192996 | \n", "0.970596 | \n", "-0.166918 | \n", "0.057912 | \n", "... | \n", "29.194306 | \n", "0.536004 | \n", "12.453652 | \n", "3.678451 | \n", "7.425861 | \n", "4.298234 | \n", "1.763692 | \n", "41.109678 | \n", "19.954455 | \n", "3.224105 | \n", "
197 | \n", "ZMB | \n", "7.324647 | \n", "-2.477074 | \n", "0.008516 | \n", "70.785004 | \n", "1.115467 | \n", "0.109273 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "10.663001 | \n", "0.902600 | \n", "4.673161 | \n", "2.997479 | \n", "3.888401 | \n", "1.197730 | \n", "0.437495 | \n", "14.061597 | \n", "6.331164 | \n", "1.679670 | \n", "
198 | \n", "ZWE | \n", "5.765326 | \n", "-2.742053 | \n", "0.691748 | \n", "78.733002 | \n", "1.087762 | \n", "0.084122 | \n", "0.523195 | \n", "-0.509133 | \n", "-0.238587 | \n", "... | \n", "18.091701 | \n", "1.646761 | \n", "7.775430 | \n", "2.343199 | \n", "4.236911 | \n", "4.851816 | \n", "1.968671 | \n", "20.617171 | \n", "7.614549 | \n", "1.679720 | \n", "
199 rows × 9146 columns
\n", "\n", " | cntry | \n", "lntrust_wvsavg | \n", "tricksters_punish | \n", "risktaking | \n", "challenge_competition | \n", "fem19 | \n", "malebias | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "AFG | \n", "NaN | \n", "-0.024060 | \n", "0.120764 | \n", "0.064749 | \n", "48.848999 | \n", "0.301265 | \n", "
1 | \n", "AGO | \n", "NaN | \n", "-0.090911 | \n", "NaN | \n", "0.080376 | \n", "75.372002 | \n", "0.081707 | \n", "
2 | \n", "AIA | \n", "NaN | \n", "-0.005747 | \n", "NaN | \n", "0.091954 | \n", "NaN | \n", "0.160920 | \n", "
3 | \n", "ALB | \n", "0.175841 | \n", "-0.028450 | \n", "NaN | \n", "0.053650 | \n", "47.081001 | \n", "0.249465 | \n", "
4 | \n", "AND | \n", "NaN | \n", "-0.009901 | \n", "NaN | \n", "0.072607 | \n", "NaN | \n", "0.211221 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
194 | \n", "WSM | \n", "NaN | \n", "0.022222 | \n", "NaN | \n", "0.022222 | \n", "23.587999 | \n", "0.044444 | \n", "
195 | \n", "YEM | \n", "0.339316 | \n", "0.033566 | \n", "NaN | \n", "0.075617 | \n", "5.827000 | \n", "0.248730 | \n", "
196 | \n", "ZAF | \n", "0.192996 | \n", "-0.021237 | \n", "0.970596 | \n", "0.072578 | \n", "48.768002 | \n", "0.150613 | \n", "
197 | \n", "ZMB | \n", "0.109273 | \n", "-0.058759 | \n", "NaN | \n", "0.073665 | \n", "70.785004 | \n", "0.102419 | \n", "
198 | \n", "ZWE | \n", "0.084122 | \n", "-0.061962 | \n", "0.523195 | \n", "0.090295 | \n", "78.733002 | \n", "0.157501 | \n", "
199 rows × 7 columns
\n", "\n", " | cntry | \n", "lntrust_wvsavg | \n", "tricksters_punish | \n", "lnyear_firstpub | \n", "lnnmbr_title | \n", "
---|---|---|---|---|---|
3 | \n", "ALB | \n", "0.175841 | \n", "-0.028450 | \n", "7.537470 | \n", "3.978251 | \n", "
6 | \n", "ARG | \n", "0.183864 | \n", "-0.021799 | \n", "7.537475 | \n", "3.611511 | \n", "
7 | \n", "ARM | \n", "0.180377 | \n", "0.001109 | \n", "7.536747 | \n", "3.793288 | \n", "
10 | \n", "AUS | \n", "0.385149 | \n", "-0.015488 | \n", "7.532831 | \n", "3.636905 | \n", "
11 | \n", "AUT | \n", "0.293364 | \n", "-0.026174 | \n", "7.520525 | \n", "4.067886 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
193 | \n", "VNM | \n", "0.390781 | \n", "-0.022763 | \n", "7.542804 | \n", "3.477649 | \n", "
195 | \n", "YEM | \n", "0.339316 | \n", "0.033566 | \n", "7.549919 | \n", "2.764451 | \n", "
196 | \n", "ZAF | \n", "0.192996 | \n", "-0.021237 | \n", "7.533932 | \n", "3.457292 | \n", "
197 | \n", "ZMB | \n", "0.109273 | \n", "-0.058759 | \n", "7.556819 | \n", "2.716290 | \n", "
198 | \n", "ZWE | \n", "0.084122 | \n", "-0.061962 | \n", "7.552801 | \n", "3.221077 | \n", "
104 rows × 5 columns
\n", "\n", " | lntrust_wvsavg | \n", "tricksters_punish | \n", "lnyear_firstpub | \n", "lnnmbr_title | \n", "
---|---|---|---|---|
count | \n", "104.000000 | \n", "104.000000 | \n", "104.000000 | \n", "104.000000 | \n", "
mean | \n", "0.228953 | \n", "-0.020420 | \n", "7.540038 | \n", "3.397068 | \n", "
std | \n", "0.107977 | \n", "0.019649 | \n", "0.011507 | \n", "0.588741 | \n", "
min | \n", "0.055010 | \n", "-0.061962 | \n", "7.487174 | \n", "1.393201 | \n", "
25% | \n", "0.148913 | \n", "-0.028557 | \n", "7.535173 | \n", "3.094192 | \n", "
50% | \n", "0.209711 | \n", "-0.021608 | \n", "7.538700 | \n", "3.543206 | \n", "
75% | \n", "0.290824 | \n", "-0.011935 | \n", "7.546710 | \n", "3.761305 | \n", "
max | \n", "0.519469 | \n", "0.041841 | \n", "7.568855 | \n", "4.599667 | \n", "
\n", " | risktaking | \n", "challenge_competition | \n", "lnyear_firstpub | \n", "lnnmbr_title | \n", "
---|---|---|---|---|
count | \n", "76.000000 | \n", "76.000000 | \n", "76.000000 | \n", "76.000000 | \n", "
mean | \n", "0.012658 | \n", "0.057511 | \n", "7.540360 | \n", "3.384763 | \n", "
std | \n", "0.301881 | \n", "0.015940 | \n", "0.011473 | \n", "0.538440 | \n", "
min | \n", "-0.792435 | \n", "0.005366 | \n", "7.487174 | \n", "1.393201 | \n", "
25% | \n", "-0.157406 | \n", "0.048737 | \n", "7.535480 | \n", "3.074302 | \n", "
50% | \n", "-0.019577 | \n", "0.059116 | \n", "7.538700 | \n", "3.416417 | \n", "
75% | \n", "0.163387 | \n", "0.066100 | \n", "7.549553 | \n", "3.692096 | \n", "
max | \n", "0.970596 | \n", "0.113599 | \n", "7.560711 | \n", "4.599667 | \n", "
\n", " | fem19 | \n", "malebias | \n", "lnyear_firstpub | \n", "lnnmbr_title | \n", "
---|---|---|---|---|
count | \n", "174.000000 | \n", "174.000000 | \n", "174.000000 | \n", "174.000000 | \n", "
mean | \n", "51.511448 | \n", "0.179793 | \n", "7.543276 | \n", "3.181844 | \n", "
std | \n", "15.755623 | \n", "0.054670 | \n", "0.013249 | \n", "0.642237 | \n", "
min | \n", "5.827000 | \n", "0.044444 | \n", "7.487174 | \n", "1.305195 | \n", "
25% | \n", "44.729750 | \n", "0.142352 | \n", "7.536789 | \n", "2.815713 | \n", "
50% | \n", "53.423500 | \n", "0.187226 | \n", "7.543113 | \n", "3.257886 | \n", "
75% | \n", "60.614251 | \n", "0.210022 | \n", "7.551734 | \n", "3.637969 | \n", "
max | \n", "84.160004 | \n", "0.310007 | \n", "7.590555 | \n", "4.599667 | \n", "