当前位置:网站首页>[knowledge atlas] practice -- Practice of question and answer system based on medical knowledge atlas (Part2): Atlas data preparation and import
[knowledge atlas] practice -- Practice of question and answer system based on medical knowledge atlas (Part2): Atlas data preparation and import
2022-07-24 02:24:00 【Coriander Chrysanthemum】
Preface article :
background
The environment preparation of the system has been introduced above . Next, we will introduce the acquisition of atlas data , Data mainly from :http://jib.xywy.com/ Crawling .
Environmental preparation
According to the original plan, the code related to data crawling is also passed , So the following related configurations are made .
The database selected for data storage here is mongodb. As usual , I still use docker Install in a containerized way , Relevant methods can refer to :Docker install MongoDB. Then install the connection mongodb Driver program :pip install pymongo. Because you need to connect to the database , Here, follow the previous practice , Create a profile .
KGQAMedicine/data/config.ini
[neo4j]
host=http://192.168.56.101
port=7474
user=neo4j
password=root
[mongodb]
host=http://192.168.56.101
port=27017
user=admin
password=123456
[sys]
KGQAMedicine/utils/config.py
#!/usr/bin/python
# -*- coding: UTF-8 -*-
""" @author: juzipi @file: config.py @time:2022/07/20 @description: """
from configparser import ConfigParser
class SysConfig(object):
__doc__ = """ system config """
# Single case , Globally unique
def __new__(cls, *args, **kwargs):
if not hasattr(SysConfig, '_instance'):
SysConfig._instance = object.__new__(cls)
return SysConfig._instance
config_parser = ConfigParser()
config_parser.read("./data/config.ini")
# neo4j
NEO4J_HOST = config_parser.get("neo4j", 'host')
NEO4J_PORT = int(config_parser.get("neo4j", 'port'))
NEO4J_USER = config_parser.get("neo4j", 'user')
NEO4J_PASSWORD = config_parser.get('neo4j', 'password')
# mongodb
MONGODB_HOST = config_parser.get("mongodb", 'host')
MONGODB_PORT = int(config_parser.get("mongodb", 'port'))
MONGODB_USER = config_parser.get("mongodb", 'user')
MONGODB_PASSWORD = config_parser.get('mongodb', 'password')
besides , We also use requests Libraries and lxml Crawl data and parse pages .
But considering the long time of the original project , The original crawled page may change . Here is the data source website , If you are interested in relevant data , You can crawl on this website . We use the data that has been crawled and processed in the original project .
Data sources
The data in the project comes from Seek medical advice Website , It should be noted that , The website also states : It cannot be used as a basis for diagnosis and medical treatment . The website page is as follows :
I put the data from the original project into KGQAMedicine/data/medicial.json in , And put the path configuration in the configuration file . In terms of data form , The data in this file should be from mongodb Derived from .
Map data import
The rewritten content code of this part is as follows :
KGQAMedicine/get_data/build_graph.py
import json
import os
import tqdm
from py2neo import Graph, Node
from utils.config import SysConfig
class MedicalGraph(object):
def __init__(self):
self.data_path = SysConfig.DATA_ORIGIN_PATH
self.graph = Graph(SysConfig.NEO4J_HOST + ":" + str(SysConfig.NEO4J_PORT), auth=(SysConfig.NEO4J_USER,
SysConfig.NEO4J_PASSWORD))
self.raw_graph_data = None
def _read_nodes(self):
# common 7 Class node
drugs = [] # drug
foods = [] # food
checks = [] # Check
departments = [] # department
producers = [] # Big drugs
diseases = [] # disease
symptoms = [] # symptoms
disease_infos = [] # Disease information
# Build node entity relationships
relation_department_department = [] # department - The relationship between departments
relation_diseases_noteat = [] # disease - Avoid eating food
relation_diseases_doeat = [] # disease - It's good to eat food
relation_diseases_recommandeat = [] # disease - It is recommended to eat food
relation_diseases_commonddrug = [] # disease - General drug relations
rels_recommanddrug = [] # disease - Hot drug relationships
rels_check = [] # disease - Check the relationship
relation_drug_producer = [] # manufacturer - Drug relations
rels_symptom = [] # The relationship between disease symptoms
rels_acompany = [] # The relationship between disease and complication
rels_category = [] # The relationship between diseases and departments
with open(self.data_path, 'r', encoding='utf8') as reader:
for data in tqdm.tqdm(reader, desc=f"reading {
self.data_path} fle"):
disease_dict = {
}
data_json = json.loads(data)
disease = data_json['name']
disease_dict['name'] = disease
diseases.append(disease)
disease_dict['desc'] = ''
disease_dict['prevent'] = ''
disease_dict['cause'] = ''
disease_dict['easy_get'] = ''
disease_dict['cure_department'] = ''
disease_dict['cure_way'] = ''
disease_dict['cure_lasttime'] = ''
disease_dict['symptom'] = ''
disease_dict['cured_prob'] = ''
if 'symptom' in data_json:
symptoms += data_json['symptom']
for symptom in data_json['symptom']:
rels_symptom.append([disease, symptom])
if 'acompany' in data_json:
for acompany in data_json['acompany']:
rels_acompany.append([disease, acompany])
if 'desc' in data_json:
disease_dict['desc'] = data_json['desc']
if 'prevent' in data_json:
disease_dict['prevent'] = data_json['prevent']
if 'cause' in data_json:
disease_dict['cause'] = data_json['cause']
if 'get_prob' in data_json:
disease_dict['get_prob'] = data_json['get_prob']
if 'easy_get' in data_json:
disease_dict['easy_get'] = data_json['easy_get']
if 'cure_department' in data_json:
cure_department = data_json['cure_department']
if len(cure_department) == 1:
rels_category.append([disease, cure_department[0]])
if len(cure_department) == 2:
big = cure_department[0]
small = cure_department[1]
relation_department_department.append([small, big])
rels_category.append([disease, small])
disease_dict['cure_department'] = cure_department
departments += cure_department
if 'cure_way' in data_json:
disease_dict['cure_way'] = data_json['cure_way']
if 'cure_lasttime' in data_json:
disease_dict['cure_lasttime'] = data_json['cure_lasttime']
if 'cured_prob' in data_json:
disease_dict['cured_prob'] = data_json['cured_prob']
if 'common_drug' in data_json:
common_drug = data_json['common_drug']
for drug in common_drug:
relation_diseases_commonddrug.append([disease, drug])
drugs += common_drug
if 'recommand_drug' in data_json:
recommand_drug = data_json['recommand_drug']
drugs += recommand_drug
for drug in recommand_drug:
rels_recommanddrug.append([disease, drug])
if 'not_eat' in data_json:
not_eat = data_json['not_eat']
for _not in not_eat:
relation_diseases_noteat.append([disease, _not])
foods += not_eat
do_eat = data_json['do_eat']
for _do in do_eat:
relation_diseases_doeat.append([disease, _do])
foods += do_eat
recommand_eat = data_json['recommand_eat']
for _recommand in recommand_eat:
relation_diseases_recommandeat.append([disease, _recommand])
foods += recommand_eat
if 'check' in data_json:
check = data_json['check']
for _check in check:
rels_check.append([disease, _check])
checks += check
if 'drug_detail' in data_json:
drug_detail = data_json['drug_detail']
producer = [i.split('(')[0] for i in drug_detail]
relation_drug_producer += [[i.split('(')[0], i.split('(')[-1].replace(')', '')] for i in drug_detail]
producers += producer
disease_infos.append(disease_dict)
return set(drugs), set(foods), set(checks), set(departments), set(producers), set(symptoms), set(diseases), disease_infos, \
rels_check, relation_diseases_recommandeat, relation_diseases_noteat, relation_diseases_doeat, relation_department_department, relation_diseases_commonddrug, relation_drug_producer, rels_recommanddrug, \
rels_symptom, rels_acompany, rels_category
def create_graph_nodes(self):
if self.raw_graph_data is None:
self.raw_graph_data = self._read_nodes()
Drugs, Foods, Checks, Departments, Producers, Symptoms, Diseases, disease_infos = self.raw_graph_data[: 8]
self.create_diseases_nodes(disease_infos)
self.create_node('Drug', Drugs)
self.create_node('Food', Foods)
self.create_node('Check', Checks)
self.create_node('Department', Departments)
self.create_node('Producer', Producers)
self.create_node('Symptom', Symptoms)
def create_node(self, label, nodes):
for node_name in tqdm.tqdm(nodes, desc=f"creating {
label} nodes"):
node = Node(label, name=node_name)
self.graph.create(node)
def create_diseases_nodes(self, disease_infos):
""" Create the node of the knowledge map Center :param disease_infos: :return: """
for disease_dict in tqdm.tqdm(disease_infos, desc="creating diseases nodes"):
node = Node("Disease", name=disease_dict['name'], desc=disease_dict['desc'],
prevent=disease_dict['prevent'], cause=disease_dict['cause'],
easy_get=disease_dict['easy_get'], cure_lasttime=disease_dict['cure_lasttime'],
cure_department=disease_dict['cure_department']
, cure_way=disease_dict['cure_way'], cured_prob=disease_dict['cured_prob'])
self.graph.create(node)
def create_graph_relations(self):
if self.raw_graph_data is None:
self.raw_graph_data = self._read_nodes()
rels_check, rels_recommandeat, rels_noteat, rels_doeat, rels_department, rels_commonddrug, rels_drug_producer, rels_recommanddrug, rels_symptom, rels_acompany, rels_category = self.raw_graph_data[
8:]
self.create_relationship('Disease', 'Food', rels_recommandeat, 'recommand_eat', ' Recommended recipes ')
self.create_relationship('Disease', 'Food', rels_noteat, 'no_eat', ' Avoid eating ')
self.create_relationship('Disease', 'Food', rels_doeat, 'do_eat', ' Suitable for eating ')
self.create_relationship('Department', 'Department', rels_department, 'belongs_to', ' Belong to ')
self.create_relationship('Disease', 'Drug', rels_commonddrug, 'common_drug', ' Common medicines ')
self.create_relationship('Producer', 'Drug', rels_drug_producer, 'drugs_of', ' Produce drugs ')
self.create_relationship('Disease', 'Drug', rels_recommanddrug, 'recommand_drug', ' Praise drugs ')
self.create_relationship('Disease', 'Check', rels_check, 'need_check', ' Diagnostic tests ')
self.create_relationship('Disease', 'Symptom', rels_symptom, 'has_symptom', ' symptoms ')
self.create_relationship('Disease', 'Disease', rels_acompany, 'acompany_with', ' complications ')
self.create_relationship('Disease', 'Department', rels_category, 'belongs_to', ' Department ')
def create_relationship(self, start_node, end_node, edges, rel_type, rel_name):
""" Create relationships :param start_node: :param end_node: :param edges: :param rel_type: :param rel_name: :return: """
# To reprocess
set_edges = []
for edge in edges:
set_edges.append('###'.join(edge))
for edge in tqdm.tqdm(set(set_edges), desc=f"building edge {
start_node} - {
end_node} rel type {
rel_type} rel name {
rel_name}"):
edge = edge.split('###')
p = edge[0]
q = edge[1]
query = "match(p:%s),(q:%s) where p.name='%s'and q.name='%s' create (p)-[rel:%s{name:'%s'}]->(q)" % (
start_node, end_node, p, q, rel_type, rel_name)
try:
self.graph.run(query)
except Exception as e:
print(e)
@staticmethod
def _write(file_path, data_list):
with open(file_path, 'w', encoding='utf8') as writer:
writer.write("\n".join(data_list))
def export_data_dict(self):
if self.raw_graph_data is None:
self.raw_graph_data = self._read_nodes()
Drugs, Foods, Checks, Departments, Producers, Symptoms, Diseases = self.raw_graph_data[: 7]
self._write(os.path.join(SysConfig.DATA_DICT_DIR, "drug.txt"), list(Drugs))
self._write(os.path.join(SysConfig.DATA_DICT_DIR, "food.txt"), list(Foods))
self._write(os.path.join(SysConfig.DATA_DICT_DIR, "check.txt"), list(Checks))
self._write(os.path.join(SysConfig.DATA_DICT_DIR, "department.txt"), list(Departments))
self._write(os.path.join(SysConfig.DATA_DICT_DIR, "producer.txt"), list(Producers))
self._write(os.path.join(SysConfig.DATA_DICT_DIR, "symptoms.txt"), list(Symptoms))
self._write(os.path.join(SysConfig.DATA_DICT_DIR, "disease.txt"), list(Diseases))
The program mainly rewrites the original program . Because it takes a long time to import the database , There is no attempt to run the import to database module program , Only enter the corresponding entity into KGQAMedicine/data/dict Under the table of contents . Interested friends can try to run import . Based on the first article importing data , perform MATCH p=()-->() RETURN p LIMIT 200 The results of querying the graph database are as follows :

There is still a lot of content .
边栏推荐
- LoadRunner12安装、录制第一个脚本以及代理服务器没有响应解决
- Network protocol details: UDP
- 关于 SAP Fiori 应用的离线使用
- 我国科学家在高安全量子密钥分发网络方面取得新进展
- 1000个Okaleido Tiger首发上线Binance NFT,引发抢购热潮
- 2022.7.22 JS entry common data types and methods
- [Luogu] p1318 ponding area
- Diablo king, analysis of low illumination image enhancement technology
- One year after graduation, I gave up the internship opportunity and taught myself software testing at home. The internship of my classmates has just ended. I have become a 12K monthly salary testing e
- 分享一个基于Abp 和Yarp 开发的API网关项目
猜你喜欢
![STM32 installation tutorial and j-link burning driver installation tutorial [the next day]](/img/09/def640c771f1b9effaaec3844d4cd3.png)
STM32 installation tutorial and j-link burning driver installation tutorial [the next day]

LeetCode 70爬楼梯、199二叉树的右视图、232用栈实现队列、143重排链表

因果学习开源项目:从预测到决策!
![[untitled]](/img/36/8dc8aa76fbcd7fdb86cd0b7b4338c7.jpg)
[untitled]

网络协议详解:TCP Part1

ACM SIGIR 2022 | interpretation of selected papers of meituan technical team

浅谈领域驱动设计
![[important notice] the third phase of planet online training is coming! Explain how to build your own quantitative strategy on qtyx](/img/37/f9ea9af069f62cadff21415f070223.png)
[important notice] the third phase of planet online training is coming! Explain how to build your own quantitative strategy on qtyx

Reconnaître le Protocole de couche de transport - TCP / UDP

关于缺少编程基础的朋友想转行 ABAP 开发岗提出的一些咨询问题和解答
随机推荐
网络协议详解:TCP Part1
Async await details & Promise
Jar package used by jsonarray in main function provided by leetcode
【补题日记】[2022牛客暑期多校1]J-Serval and Essay
The difference between.Split (",", -1) and.Split (",")
什么叫裸写SQL?express操作mysql用什么中件间或插件好呢?
ASP. Net core write a cache attribute tool
The communication principle between native components, applets and clients, and the operation principle of video, map, canvas, picker, etc
Deliver temperature with science and technology, vivo protects the beauty of biodiversity
[untitled]
认识传输层协议—TCP/UDP
Reconnaître le Protocole de couche de transport - TCP / UDP
Upload files to flash file system through Arduino IDE
Magazine feature: the metauniverse will reshape our lives, and we need to make sure it gets better
Vue3 uses keep alive to cache pages, but every time you switch the tab to enter the page, it will still enter onmounted, resulting in no caching effect. Why?
图解数组和链表详细对比,性能测试
我国科学家在高安全量子密钥分发网络方面取得新进展
新红包封面平台可搭建分站独立后台的源码
STM32 concept and installation [day 1]
Network protocol details: UDP