The word we hear too much data, but few people realize that, with the development of Internet, nowadays these "big" words for what has meant a kind of level. Removed minority grandstanding of using this word of company outside, those real has big data of enterprise, handled and processing of are is some people may poor very life also contact not to of digital: micro-letter friends circle daily will upload 1 billion Zhang pictures, paid treasure day turnover peak over 20 billion Yuan Yuan, Beijing East daily upload millions of Zhang new of commodity information figure...... Ted Baker iPhone 6 Case
These figures are in urgent need of training artificial intelligence algorithm, it is very good news. Also means that the data is as important for artificial intelligence computing, algorithm development skyrocketed. But in a broad array of data to filter out those really useful for us? By analyzing these data to make decisions favorable to itself? This is the data the scientists doing the.
This issue hard to create open class, we invite to the iPIN Pan Rong, Chief Scientist, he received the doctor of Science degree from Zhongshan University in late 2004, in February 2005 and from August at the Hong Kong University of science and technology, respectively, in August 2007 and from September in the United States of HP Labs, data mining, artificial intelligence research. October 2009 through the hundred plans entered the Sun Yat-sen University, served in the Department of computer science. Chief Scientist, iPIN, 2014.
Dr Pan Rong in 2005, participation in United States computer society (ACM) organized by data mining international matches (KDDCup: annual data mining of the world's most important games). When classification of the theme of the contest was a search engine query. Eventually all three projects (including query classification algorithm accuracy, performance and innovative) first prize. Have been two United States patents. In the relevant fields of international academic conferences, journals and magazines published more than 20 academic papers, including Artificial Intelligence, IEEE Transactions on Knowledge Discovery and Data Engineering, ACM Transactions on Information Systems, AAAI, IJCAI, ACM SIGKDD, UAI, ICDM, and so on. And is a reviewer for multiple journals, Conference (Program Committee), including the IEEETransactionsonKnowledgeDiscoveryandDataEngineering,IEEE/ACM...... AAAI,IJCAI,ICDM,WSDM,CIKM,ECML,ACML,BMWT,AAIM,PRICAI,WI,WINE, and so on.
If you also want to close nearly xueba elite users we do Exchange, also wants to become our Chief Scientist shared industry guests, please send an e-mail to email@example.com
▎ Circle after you graduate, first went to the Hong Kong University of science and technology, what are your core research?
Here we would like to thank my doctoral supervisor Professor Li and Professor Yao zhengan, although they are not the areas of data mining and machine learning, but they are my PhD research directions of latitude and practical guidance are make me very grateful. My doctoral research is based on kernel (Kernel) study of machine learning algorithms. After the Hong Kong University, under the guidance of teacher Yang Qiang, I based on case-based reasoning method using nuclear (Case-based reasoning).
Meanwhile, we are also in a search engine query classification and in partnership with NEC companies, we studied the sequence of semi-supervised learning algorithm and its application to wireless indoor location. Among them, the query categorization problem comes from Google, Baidu, Yahoo, and Microsoft search engine needs of large companies, the goal is to enhance the precision of advertising and quality sorting of search results. HKUST in Hong Kong experience, I would like to thank Professor Yang Qiang's guidance and help to me, at that time, exercise and improve my data mining, machine learning research skills, such as: sorting direction, discovery, research, thesis writing.
Back to HP Labs. What got tickets?
Lucky HKUST in Hong Kong to attend the 2005 ACM KDDCup matches and won all three first. Indeed I have had a great impact on the direction and results of the research, and I got the chance to HP Labs.
What is your core research at HP Labs?
I to HP laboratory participation of project is a called Chameleon project, actually is personalized recommended algorithm of research project, at also is PC times, global market in the 5 Taiwan PC on has a Taiwan is HP production of, in United States, this share more is 1/4, as long as in user license of situation Xia, on with today of mobile Internet similar, HP is can collection to user in PC Shang of various behavior data, then for user provides personalized recommended service, to better of upgrade user experience, in at, Recommender System algorithms use data is user rating data, that is, the user after a product or service is consumed, hit rating, recommendation system can work effectively. When I joined the Chameleon project process, we found that most users were the lack of scoring this process, it is also very reasonable, a lot of people after consumption or experience may not to rate, so I ask how in the case of no user ratings, can still be featured, then I propose the One-Class Collaborative Filtering (OCCF) algorithm Published on ICDM'08, then in order to solve the problem of efficiency, I have proposed a new OCCF acceleration algorithm, was accepted by KDD'09.
Experience working at HP Labs further strengthens my enter the relatively new field of study ability and confidence, including: problem-solving, math, algorithms, analysis, engineering capabilities.
Why did came back from HP and select go to school?
One is my own personality, reason: independence or autonomy in doing research, did not like dealing with complex relationships. Another is the cause of family and workplace, there is loose, free of Sun Yat-sen University academic atmosphere, good research conditions and environment, students are also very good.
Great experience for iPIN, Chief Scientist at work in what help?
After returning to the Sun Yat-sen University, according to HKUST in Hong Kong in the past and HP Labs experience, my laboratory studies research interests in collaborative filtering, message inspection, natural language processing, and so on. Over the years, making my research further accumulation of experience, and I also accumulated in the selection and cultivation of students ' experiences. Remember when I entered graduate school, my mentor Professor Yao zhengan told us that the phrase, "there are no bad students, only bad teachers".
At I think Yao teacher how so dare said, but does he on I of help very of big, while we of Dr students of research direction does also not too as, Yao teacher words I has been remember, to I himself Dang teacher of when also with he of this sentence words to requirements himself, just, I is do personalized algorithm research of, so, I hope as do aptitude, this exercise I of talent training aspects of capacity.
Sun Yat-sen University, I major databases, data mining, information retrieval courses, according to our own research, every year I hope to add some new courses, to let students learn advanced knowledge, and also helps me sort out their own research.
▎ Yan Watertown this academic Daniel said his transfer from academia to industry, and I can't help but feel disturbed, you 2014 into the industrial sector, becoming a startup CDO, determined comes from where?
Actually I have the same font, but as a result of my own earlier experience, like answers to practical questions, or to solve the actual problem-oriented. In addition, in the current context, University met the real problem is that industry has real data and real application problems.
You feel capable and competent industry and Chief Scientist for the first time this Title was when? Since every landmark event?
To be honest, I don't think I'm up to this title, and perhaps this issue should be made by my partner or the future, thank you!
In academia and industry, in particular where different? Where the same?
My current directions in data mining, machine learning, natural language processing, rely heavily on large amounts of data. In terms of issues, more practical and more direct from industry; academics are more focused on basic research.
In terms of problem-solving: more focus on solutions in the industrial sector of cost and to strike a good balance between the effect. Academics in the algorithm for more innovation.
Chief Scientist, young scientists from the academic to the startup, growing difficulties encountered in the process or not? Ted Baker iPhone 6 cover
Is not necessarily the growth process, may be how to adapt to the changing role of.
First of all, is to have a good partner, must be made more consistent in character, values and roles should be complementary. In addition, in the University, mainly training, transportation to the community. IPIN, team is the core task, you must catch two aspects of personnel selection and personnel training. Meanwhile, also talked about above, enterprises are very cost-conscious (including: money, people, time), your task is not simply to paper, is more important to consider whether your solution can be landed, ground problems.
A real company, requirements for a Chief Scientist is like?
I am currently not up to, I think is the individual academic level should have enough depth and breadth. While continuous learning, can understand the practical problems. And can be transformed into a machine learning problem, proposed screening programme to solve the problem, even in academia, industry trends, and gain insight into the future direction of research, technology development, has their own judgment, advanced layout.
As researchers in the direction of the large data, select the landing direction how to start?
First of all, I would like to briefly introduce the iPIN several products:
Perfect choice, HaoHR and compass. The perfect choice is a tailored for the college entrance examination students volunteer programme, about the job prospects of college entrance examination in advance of the reporting application. By analyzing the past 40 million college students using the database exclusive and innovative algorithms, helping candidates more scientific and more efficiently select the right college and professional.
HaoHR is a smart match more suitable resumes, free HR resume screening products. Using semantic analysis interpretation demands intelligent portrait of talent, help HR in a short time to find more people and the job description in experience. Simplify HR resume search and selection process, ask HR to spend time on more worthwhile work.
Compass is a user experience matching the product opportunities and career planning. Through a deep explanation of artificial intelligence and semantic analysis technology job seeker experience, comprehensive, accurate, quick to help job seekers to find more and better job opportunities. Large data analysis using hundreds of millions of people's occupational history and market trends, time and peace of mind to help job seekers make career decisions judge
Above are the three products, in fact, that several products in the product form is not the same, but the kernel is talent in the workplace data analysis and mining, in 2013, we determined the use of human data, build economic Atlas of China. And in early 2014 after this step, we continue to research and development related products, such as the above example.
Please use a concrete example for example, extent to which data mining do, can really deliver value? ----How these data collection, aggregation, and architecture, machine learning, natural language processing, complex data analysis, forecasting models, the proliferation of computing, visualization, data applications and other steps to become end users find valuable data.
I specifically said the perfect volunteer, 2015 it is most widely used college entrance examination of the reporting tool, tailored to energy volunteering programmes, know in advance over more than 100,000 over more than 2,500 University departments with employment information, artifact known as the college entrance examination by many users. Perfect volunteer by artificial intelligence company iPIN of scientists team tries their best to build, used volunteer filled gold law, take "admitted probability forecast-personal preference filter-character career match-employment prospects analysis-volunteer strategy select" five a steps, this five a aspects is we according to user research situation for analysis Hou get of user real of needs, achieved these, perfect volunteer this products to help entrance students and parents more science more reasonable to select volunteer, real do for dream navigation.
Is a set of universal methodology there are some exclusive experiences?
First of all, familiarize themselves or had a similar experience, to have the relevant data, there is a market, competition is moderate (education and market competition in the market-price), to learn how to ride, because in the real causal factors in the process of starting, to be flexible.
Student questions: think of an algorithm. I often see papers will find something very clever algorithms, other scenarios other than algorithm, would like to know how one is born?
In fact, this may be a long process, we often said to stand on the shoulders of Giants one small step. First of all, we want to climb the shoulders of giants, climbed up, it makes sense to take one small step. So we need to know about a (or) and the work done in the field, in the area of what are major and influential researchers. What are the top conferences, journals, and the Internet now so advanced, these should not be too difficult. Head past a large number of books and documents that we need to read, absorb and digest.
The process involved in the chosen direction, found the research points (or problem), which can be read a lot of papers in a training capacity in this area, you will not only learn the virtues of literature section, while their habits questioned, what assumptions do not fit in the paper and which algorithm has room for improvement
In addition, this process should also be combined with practical application of your specific research problems, problems (equivalent to climb the shoulders of giants, was ready to take one small step). We commonly think of solution algorithms and verify the validity of this process is relatively easy, of course, involves a lot of details here, it does not expand, are interested in, you can refer to Eamonn Keogh of 2012 in KDD tutorial on how to do data mining research.
Student questions: data mining is not a clearly defined subject, if we chose this approach, and what lessons should be compulsory? Elective courses?
Compulsory: (pre-courses in program design, data structures and algorithms, architecture, computer networks, and so on) probability and statistics, databases, machine learning and pattern recognition.
Optional: GPU/parallel computing, data warehousing, data visualization, deep learning, business intelligence (BI), swarm intelligence (CI), some courses for different application areas, such as information retrieval, NLP, speech, and image.